Inline lists: a) bananas b) apples c) oranges

I stumbled over the description of inline lists in this LaTeX wikibook. I noticed two things:

Inline lists

Coco likes fruit. Her favorites are: a) bananas, b) apples, c) oranges and d) lemons. Lorem ipsum dolor etc. etc.

I really like inline lists. But they suck if the list symbols (or text) is not propperly styled. I’m not sure that they are used a lot, but i think they should be used more. Probably the usage is infrequent because it is relatively difficult to create such an inline list in programs like Word or Writer. Adittionally HTML presupposes that a list is a block-level element.

I still don’t know how one would solve the markdown for this. However, simply surrounding it with brackets might work inline. If you still want trailing or surrounding brackets, just double them:

 (-) bananas, (-) apples, (-) oranges and (-) lemons
 (*) bananas, (*) apples, (*) oranges and (*) lemons
 (+) bananas, (+) apples, (+) oranges and (+) lemons
 (1.) bananas, (2.) apples, (3.) oranges and (4.) lemons
 (i.) bananas, (ii.) apples, (iii.) oranges and (iv.) lemons
 (1)) bananas, (2)) apples, (3)) oranges and (4)) lemons

The problem is that it would not be clear where the last item ends. A finishing symbol like (/) would be the only solution i could think of. This (of course) makes them a lot less beautifull to write :unamused:

Maybe this would be a candidate for a generic syntax like this:

Coco likes fruit. Her favorites are: !inline-list[(a) bananas, (b) apples, (c) oranges and (d) lemons]. Lorem ipsum dolor etc. etc.

Definition lists

Firstly, lists with names instead of bullets:

       ant    really busy all the time
     chimp    likes bananas
 alligator    very dangerous animal, ...

That made me think that the definition lists should probably be used this way.

For Commonmark it should probably be possible to have in the same line as the item they define, since the common usage with leading : in the next line would be very bulky in many cases.

I do see frequent usage here, e.g. a list of medical stages or grades of some disease (example is made up entirely). I think it would not be adequate to use a table in this case.

          stage I°:  no treatment required
intermediary stage:  treat symptoms
         stage II°:  terminally ill

A colon at the start of a new line is much less common in general prose than a colon part way through the line. The results are more predictable. Allowing the colon to be placed anywhere on the line would break a lot of existing documents where the colon character or a named list item is used but a definition list is not intended. I like your examples where the colons line up on the right hand side, so perhaps there could be a rule: if two or more lines have colons that line up, the parser considers this to be a definition list. I imagine this wouldn’t be without complications though. What if you only wanted one key/value pair? And it would need to handle hard breaks when used in long defintions, e.g. the parser would need to handle this:

          stage I°: no treatment required
intermediary stage: treat symptoms. this is a long defintion that covers more 
                    than 80 characters and has been hard wrapped over multiple
         stage II°: terminally ill

Sorry for not being clear anough: I didn’t want to suggest a new syntax for definition lists yet. Just the general thought, that description lists (as they are newly called in HTML5 instead of “definition lists”) can have a different and more frequent usage than just for a glossary or an alphabetically sorted technical terms list.

That means that I see the “stage I°” as being a list symbol just like “1.”, “a)”, “i.” or bullets and dashes.

I have seen proposals for an inline definition list syntax in this forum. I think someone suggested ~: or :~ (can’t remember exactly). Maybe this is indeed a good solution.

Other option i would make up, that is indeed similar:

~ term1 : definition 1
~ term2
  : definition 2

The tilde is indeed similar to the dash in visually representing a new list item, but there would be no ambiguity about whether you want an unordered list item or a description list item.

The tilde would also remove the need for a look-back when parsing an ddetecting a defintion list.

The combination would probably make the whole thing quite distinct. Both the tilde and the colon could still be easily used for other purposes, because they would only be a description list in combination. To get a empty description use ~ term : to get an empty term use ~ : description.

I would keep the space before the colon _:_, because this would make escaping needed less often in cases like ~ cell cycle: anaphase : description, but possibly the dense version would be no problem either.

Regarding the “inline list” topic: You’re right that HTML only has “block-level” lists which can’t occur inside a <p> element (ie as “inline-level” content). But other document types have such lists, for example <simplelist> in DocBook.

However I dislike the urge to pile on special-purpose “extensions” to the Markdown syntax for things like this. How about a “generic syntax” that represents your (slightly expanded) example like this:

<subject/Coco/ likes fruit. Her favorites are: <il/bananas,<>apples,<>oranges and<>lemons./ Lorem ipsum dolor etc. etc

This is IMO even somewhat nicer and terser (and as always the list labels should be generated by the processing application)—but most importantly, it is already valid (SGML) markup.

The only change required here in Markdown would thus be to recognize and “pass through” empty start tags <> (and similarly, emtpy end tags </>), and so-called NET-enabling start tags <gi att=value ... /. (A “NET” or “null end tag” is - usually - a solidus / that acts as an end tag.) This is trivial to implement, and the resulting output

<p><subject/Coco/ likes fruit. Her favorites are: <il/bananas,<>apples,<>oranges and<>lemons./ Lorem ipsum dolor etc. etc</p>

is already valid SGML with respect to a DTD like this one:

<!DOCTYPE test [
<!ENTITY %    "#PCDATA|subject|il">
<!ELEMENT  test    O O (p+)>
<!ELEMENT  p       - O (*>
<!ELEMENT  subject - - RCDATA>
<!ELEMENT  il      - - (li+)>
<!ELEMENT  li      O O (*>
<p><subject/Coco/ likes fruit. Her favorites are: <il/bananas,<>apples,<>oranges and<>lemons./ Lorem ipsum dolor etc. etc</p>

Note that this makes rather heavy use of tag omissions (the <li> element is not even mentioned in the Markdown text!).

If you (or your tools) prefer less “minimized” markup, you can for example pass this document through sgmlnorm to “restore” the omitted tags and produce:

<P><SUBJECT>Coco</SUBJECT> likes fruit. Her favorites are: <IL>
<LI>oranges and</LI>
</IL> Lorem ipsum dolor etc. etc</P>

This same approach would allow a Markdown author for example to use HTML phrase elements like <DFN>—for which there is no Markdown syntax—with “null end tags” (NETs) in this manner:

A <dfn/empty start tag/ is a start tag of the form `<>`, ...

as a shorthand alternative to the “regular” and already existing form

A <dfn>empty start tag</dfn> is a start tag of the form `<>`, ...

A quick rant

Given (1.) Markdown’s (or Gruber’s) stance that (emphasis mine)

Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert.

and (2.) the assertion in the Commonmark spec that (emphasis mine)

Tag and attribute names are not limited to current HTML tags, so custom tags (and even, say, DocBook tags) may be used.

and (3.) the fact that SGML (and HTML to some extent) already has various ways to “minimize” markup as a convenience for human authors, as argued for by Goldfarb (in annex A.3 of ISO 8879:1986) regarding tag omission (emphasis mine):

The price of this simplicity, though, is that an end-tag must be present for every element.

This price would be totally unacceptable had the user to enter all the tags himself. He knows that the start of a paragraph, for example, terminates the previous one, so he would be reluctant to go to the trouble and expense of entering an explicit end-tag for every single paragraph just to share his knowledge with the system. He would have equally strong feelings about other element types he might define for himself, if they occurred with any great frequency.

With SGML, however, it is possible to omit much markup by advising the system about the structure and attributes of any type of element the user defines.

I therefore would much prefer if a Markdown processor rsp the CommonMark spec would “know enough” about already existing markup minimization techniques in order to “get out of the way” of authors who want to use them, instead of bolting on additional ad-hoc syntax for all kinds of situations.

Just because XML (for good reasons—which are irrelevant here) went into the opposite direction of the Goldfarb quote above and did away with all the SGML conveniences for human authors—aka markup minimization rules—in the interest of simplicity for implementations, and famously stated “terseness in XML markup is of minimal importance” as a design goal, should IMO not mean that CommonMark needs to re-invent them all over. At least not in cases where they already provide clear, simple, short and standard ways to mark-up text.

1 Like

Bam, I really like that! I mean, I do prefer the markdown way for general usage, but inline lists and a lot of other special stuff would probably work very well this way.

This should be in place before generic syntax is considered.

Glad to read that! :slight_smile:

Btw, the examples I gave were produced with a hacked-up version of libsoldout, where I had just added as a “proof of concept” recognition of empty start and end tags and NET-enabling tags. See the commit here.

As you said, I think this is in particular handy for “phrase” mark-up like in the <DFN> or <subject> example.

SGML syntax features would be relevant to the discussion iff MD/CM could be described by a DTD. I looked into it several years ago, but gave up quickly for the sake of my remaining sanity.

I’m not sure I understand what you mean here—a DTD is about structure (content models of elements), not about syntax features, I would say. You’re sure right that a DTD, or at least some DTD-like information, is needed to “detect” the omitted <li> start tag (after the <il/ start tag) in my example above:

Her favorites are: <il/bananas,<>apples,<>oranges and<>lemons./ Lorem ipsum dolor etc. etc

But why should this be a problem as long as the CommonMark processor reproduces this markup in the text it spits out? As I said, a subsequent pass through a proper SGML parser can figure it out easily.

Note that neither CommonMark (specification and implementation) nor any other Markdown processor bothers much to “understand” literal (HTML or otherwise) tags in the sense that it keeps track of which elements are open at any given point.

That’s why neither CommonMark nor (most?) other processors can assert even well-formed XML output, try for example:

 in inline text <span>Huh? **Ha! </span> bad** things can happen!


 in inline text <span>Huh? **Ha! bad** things can happen!

in Babelmark (Maruku and s9e seem to be different, though).

And apart from start tag omission, the “short” syntax I used can be parsed unambiguously without any DTD just as well as the corresponding “long” XML syntax (a marked-up text with this property is called an “amply-tagged document instance”):

<subject/Coco/ likes fruit. Her favorites are: <il/<li>bananas,<>apples,<>oranges and<>lemons./ Lorem ipsum dolor etc. etc

is—in the absence of any DTD!—exactly equivalent to the verbosely marked-up

<subject>Coco</subject> likes fruit. Her favorites are: <il><li>bananas,</li><li>apples,</li><li>oranges and</li><li>lemons.</li></il> Lorem ipsum dolor etc. etc

This has in my view nothing to do whether CommonMark “could be described by a DTD” (what does that really mean, and how does it relate to the CommonMark.dtd?), and I see no good reason why authors should be barred from using such shorter syntactical forms: They make no difference to the CommonMark processor anyway.

I should have been more verbose. I meant a DTD and an accompanying SGML declaration that would make it possible to use other characters for tag limits than the default <, > and /. My SGML expertise has been dormant for quite a while now, but if I remember correctly this should be possible in theory but is unsupported by most if not all tools.

Yes, a SGML declaration can define its own “delimiter set”, assigning character strings to delimiter roles like etago (end-tag open), eg using “{:” instead of </. But I have never seen this used in practice, and yes, not many parsers support this anyway ([[o]n]sgmls does not, yasp does, AFAIK).

But this is not what is happening here: empty start tags (<>), end tags (</>), and NET-enabling start tags (<gi att=val ... /) are all available in “Basic SGML Documents”, which use the “reference concrete syntax” and a minimal set of features—ie, portable and “vanilla” documents.

This is the same syntax (except for things like allowing “:” in NAME, longer names etc) and the same feature set that HTML is (was) based on—it was only that many “user agents” aka browsers did not implement stuff like this, as noted in the HTML 4.01 spec.

[Edit:] Just for fun and not really on topic, here is Rick Jelliffe contorting James Clark’s SP parser with a special-purpose DTD (making heavy use of short references) into parsing a “wiki-like” notation like this …:slight_smile:

!An Example Document

This is an 
example document.

*It has some
kind of list
**with some kinds of nested list
* and also
##type of

But that is '''not''' ''all''!
You can link by URL alone
[], by name plus **URL**,
or by an existing name only 

This has long gone off topic from inline lists. Rick Jelliffe’s experiment is exactly what I meant, but for Markdown instead of Wikitext.

In accordance with my proposal for the definition/description list, i’ve thought about inline lists again. Maybe the tilde ~ and colon : combination could be used inline like this (with automatic detection of list element markers inside):

Coco's favorites: ~: a) bananas b) apples c) oranges and d) lemons. :~

Another different option would be to prepend each list element with the tilde symbol, or to put it the other way around: after each space + ~ check if the following text is a list item marker (as if it was the beginning of the line):

Coco's favorites: ~ a) bananas ~ b) apples ~ c) oranges and ~ d) lemons.
Coco's favorites: ~ 1. bananas ~ 2. apples ~ 3. oranges and ~ 3. lemons.
Coco's favorites: ~ *  bananas ~ *  apples ~ *  oranges and ~ *  lemons.

If needed an “empty” list item could be used to terminate an inline list (~ Enough is not a list item marker.)

Coco's favorites: ~ * bananas ~ * apples ~ * oranges and ~ * lemons. ~ Enough fruits for Coco.

This would also work with my definition list proposal (by combining the tilde with the colon):

Coco's favorites: ~ sweet : bananas ~ red : apples ~ juicy : oranges and ~ sour : lemons.

Maybe the unordered inline list could be implicit, i.e. only ~ instead of ~ * or ~ -
or ~ + (but i don’t know how one would terminate the list here, since ~Enough could be a list item):

Coco's favorites: ~ bananas ~ apples ~ oranges and ~ lemons. ~ Enough fruits for Coco.

If I had to type-write an “inline list” properly (in plain text, maybe to be processed as Markdown, or just to be line-wrapped by fmt), I would (have to) use NBSP at certain places, in order to prevent inconvenient or misleading line breaks. In particular, a line break right before the list item markers should be avoided, for otherwise this would not look “inline” any more.

Here’s an example (using ~ as a “visible NBSP”):

This one

Coco's favorites: a) bananas, b) apples, and c) oranges.

could wrap unfortunately into, say

Coco's favorites: a) bananas,
b) apples, and c) oranges

In order to avoid such cases, one could use NBSP (~) like this:

Coco's favorites:~a) bananas,~b) apples, and~c) oranges.

So in principle a tool could figure out, from the hints that the (invisible) NBSPs provide, what is going on here. Though it would not map easily into a document type like HTML without “inline lists”, of course.

In fact, such “inline” lists are so rare (to be seen, or to be represented in a document type) in my view, that it’s probably not worth the effort.

On the other hand, standard document types like DocBook, NISO JATS, ISO 12083:1994 all have paragraph elements whose (mixed) content model includes “ordinary” lists, but there’s no rendering expectation that the list is shown “inline”. There is no specific “inline list”, as this seems more a style and rendering issue, not a question of element structure. (HTML of course has lists only outside the <P> element.)

So an “inline” (or rather “inside-paragraph”) list would have the rather natural (and already existing!) CommonMark syntax:

Coco's favorites:
a) bananas,
b) apples, and
c) oranges,
and I like those too.

which could be mapped to a single DocBook paragraph as

<para>Coco's favorites: <orderedlist><listitem>bananas, </listitem>...</orderedlist> and I like those too.</para>

whereas—for DocBook etc—the following means something rather different:

Coco's favorites:

a) bananas,
b) apples, and
c) oranges,

and I like those too.

Namely one paragraph, a list, and then another paragraph.

Similarly, (to the NBSP remark above) if I would format a “definition list” in a typescript (ie, a plain text file, with no knowledge of Markdown), it would look maybe something like this:

    Binary digit; that is, either zero or one.

Document type definition:
    Rules, determined by an application, that [...]

In Markdown (without genuine “definition lists”), there would be (have to be!) two SPACEs following the COLONs after the terms. (And a “\” preceding the “[” …:wink: )

From the parsing result from cmark

<p>bit:<br />
Binary digit; that is, either zero or one.</p>
<p>DTD:<br />
Document type definition:<br />
Rules, determined by an application, that [...]</p>

a tool could again figure out what is meant here (“a paragraph starting with a single line ending in a COLON and then <br>? – looks like a <DT> to me!”) And of course cmark could figure that out all alone.

I think what I want to say here is that before sprinkling “special characters” all over the text maybe a better start is a “conventional plain text formatting rule” one could and would apply using a good-old typewriter, and try to recognize this in the parser. At least this seems more aligned with Gruber’s “philosophical” principle

Readability, however, is emphasized above all else. A Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions. — Gruber: Markdown Syntax

That would indeed be a easy possibilty!

Actually i’d rather intuitively place a NBSP after the list marker, so that strang wrapping like this would not occur (both in plaintext and in formatted text assuming a fallback)

Coco's favorites: a) bananas, b)
apples, and c) oranges

This looks to me more like a glossary, and IMHO the definition list should be usefull in a number of other use-cases, like let me call them “Named numerals”, as in

# Skin Burn
I° : reddening, swelling, pain (e.g. sunburn)
II° a : severe pain, wet skin, ...
II° b : grayish, less pain
III° : no more pain, brown/black skin, hard, dry, ...
IV° : coal skin

Note how the list is quite dense, just like a “tight” ordered list. I’m not sure whether the following would be adequate for this kind of definition list:

I° : 
    reddening, swelling, pain (e.g. sunburn)
II° a : 
    evere pain, wet skin, ...
II° b : 
    rayish, less pain
III° : 
    no more pain, brown/black skin, hard, dry, ...
IV° : 
    coal skin

It would—only that on second thought the last line would actually be incorporated into the c) orange item, according to current CommonMark and Markdown “lazy” indentation rules. So some tinkering with the syntax rules might be needed (?).

(I’m generally not that fond of the rules that allow list items or blockquotes to “interrupt” paragraphs, but the case at hand seems to be an argument in favor of it …)

Compromise: Put NBSP on both sides of the list marker, preventing both weird line wrappings: :-)

Coco's favorites:~a)~bana&shy;nas,~b)~apples, and~c)~oran&shy;ges

Having the <DD>s nicely indented and aligned (like in your second listing) corresponds at least to the default rendering of HTML “definition lists”, and also to the rendering in the man macros producing man pages. If the marker (ie, the <DT>, or “label”?) is short enough, it could fit into this indent, as man (and I think LaTeX) would produce:

I°:  reddening, swelling, pain (e.g. sunburn)
II° a: 
     severe pain, wet skin, ...
II° b: 
     grayish, less pain
     no more pain, brown/black skin, hard, dry, ...
IV°: coal skin

Alternatively, and when in fact a “tight” look and no common indentation is desired, a better approach would IMO be to use an “undecorated” list like DokBook’s <simplelist> element: In the Markdown typescript I would write

- **I°:**  reddening, swelling, pain (e.g. sunburn)
- **II° a:** severe pain, wet skin, ...
- **II° b:** grayish, less pain
- **III°:** no more pain, brown/black skin, hard, dry, ...
- **IV°:** coal skin

and (somehow ;-)) transform this into (non-XML for terseness!):

<member><glossterm>I°</> reddening, swelling, pain (e.g. sunburn)
<member><glossterm>II° a</> severe pain, wet skin, ...
<member><glossterm>II° b</> grayish, less pain
<member><glossterm>III°</> no more pain, brown/black skin, hard, ...
<member><glossterm>IV°</>coal skin

expecting a rendering similar to (with or without the colon, or using an en dash etc):

    I°: reddening, swelling, pain (e.g. sunburn)
    II° a: severe pain, wet skin, …
    II° b: grayish, less pain
    III°: no more pain, brown/black skin, hard, dry, …
    IV°: coal skin

My point here would be that <DL> is not neccessarily the best choice in this case, although the HTML5 spec mumbling about it being an «association list consisting of zero or more name-value groups» and:

Name-value groups may be terms and definitions, metadata topics and values, questions and answers, or any other groups of name-value data. — W3C REC HTML5 – The DL element

would suggest that it “fits in all cases”. (Is «II° a» in your example a “term”, or rather just a “label”? A “symbol”, or “abbreviation”? Depending on the target document type, there might be specialized lists for these, too!)

But if generating a <DL> is indeed the goal, the “indented” and not quite so tight look (in the CommonMark typescript) is IMO appropriate.