Ordered vs Unordered List

Latex uses enumerate and itemize (and description) – so what?
Why should HTML or *gasp* SGML terminology matter so much (more) for CM?

Since CM authors need to type either a bullet marker +/*/- or a number followed by an ordinal marker ./), I think bullet list item and enumerated or enumeration list item make a lot of inherent sense. The lists consisting of such items would naturally be called bullet lists and enumerated/enumeration lists.

Why should HTML or gasp SGML terminology matter so much (more) for CM?

Mostly because XML/SGML “technology” (note that “HTML” is a different category!) does matter so much for CM.

I think this and related questions boil down to the decision on how much the CommonMark specification and syntax should be tied to:

  1. one or another version of HTML and the terminology used in the particular HTML specification, and

  2. the markup syntax and structural concepts of XML and SGML in general.

I certainly do agree that CommonMark should be specified in the most general terms and as independently from any particular document type (like HTML) as possible, in order to allow a wide range of use scenarios and “target” document types; and in fact the specification right now has IMO already (or still) too much “HTML bias”.

I would even prefer to define CommonMark completely in terms of its own, “native”, document model, for example using the CommonMark DTD. Mapping this into HTML in a “canonical” way is trivial to specify anyway.

But I just see no good reason for nor a reasonable way of making CommonMark independent from the the XML/SGML markup languages1) in general, for the simple reason that the sole purpose of Markdown and hence CommonMark was and is to provide an “author-friendly” syntax for creating HTML/XML/SGML content.

Just note how much the specification is concerned with character references and “raw HTML”, including comment declaration, processing instructions, markup declarations! Without these “loopholes”, you’d end up with either a very limited syntax subset with very restricted expressive power, or with using whatever other syntax (maybe LATEX, maybe eqn and tbl, or whatever) for the “missing features” of CommonMark.

And it was indeed an explicit design choice (by Gruber) for Markdown to allow the use of “raw” HTML syntax, which means from my point of view, and by extension: allowing to use “raw” XML/SGML syntax for these “missing features”.

It sure would be interesting to design a Markdown syntax variant without any recurrence to even XML/SGML, but CommonMark is not it; and this would basically require to come up with a complete new (extensible!) markup language: which amounts to a complete alternative syntax for XML.

As always, you can process the content authored in CommonMark syntax in any way and transform it into any format you like, even without technically and explicitly generating a marked-up XML document in between: but this still would not make CommonMark unrelated to XML/SGML, wouldn’t you agree?

So why should the CommonMark specification suddenly avoid terms—in particular names for document elements which have been in common use even long before HTML was inventend—from precisely this field of markup languages, and start borrowing eg from LATEX, terminology? Or nroff? Or RTF? Or SCRIBE, or OpenOffice, or MS Word and so on?

That said, I don’t think the choice between “ordered list” and “enumerated list” rsp “unordered list” and “itemized list” is very important, or would really matter much or would enhance or degrade the spec in any significant way—beyond that your preferred terms feel rather arbitrary and “foreign” in this context, in my opinion. (I’m not saying that one choice is “better” than the other!)

But I do think the fundamental question discussed above is in fact an important one for CommonMark.


  1. More precisely: the syntax of XML respectively the reference concrete syntax of SGML, and the corresponding content model (a small subset of the XML Infoset rsp SGML ESIS).

This was, I think, the sole purpose of Markdown originally, as John Gruber conceived of it. But it is certainly not the sole purpose of Markdown variants today, and it’s certainly not the sole purpose of CommonMark.

Indeed, I think one of the nicest things about Markdown/CommonMark is the ability to have a single source document that can be rendered with perfect accuracy into many different formats. I regularly convert Markdown to PDF (via LaTeX), man pages, HTML, EPUBs, and Word documents. I know others who convert Markdown to ICML for import into page layout software. Markdown is rooted in an HTML-generating script, and the ability to include raw HTML reflects that initial purpose. But the idea that Markdown is primarily useful as a way of generating HTML is something we’ve grown out of.

3 Likes

How many of these (excluding Markdown) have bullet lists? I know Word does (since 1983), but how about the others?

I believe CommonMark should use the most accurate terminology, not the most common (no pun intended) or, for that matter, the oldest. A project I’m working on has a format (not invented by me) that defines an unbulleted list. Is that a special case of bullet list? Should the project’s documentation refer to unbulleted bullet lists?

Indeed, I think one of the nicest things about Markdown/CommonMark is the ability to have a single source document that can be rendered with perfect accuracy into many different formats.

I completely agree, and isn’t this also the nicest thing about XML (and SGML) too? And doesn’t Markdown/CommonMark basically inherit this feat from there? You don’t generate ePUB (XHMTL) from cmark’s LaTeX output, do you?

But the idea that Markdown is primarily useful as a way of generating HTML is something we’ve grown out of.

Absolutely right: that’s why I’d prefer to minimize CommonMark’s reliance on HTML specific character names, element type names, syntactic peculiarities and so on.

But even if you convert CommonMark “directly” into LaTeX (via cmark say): this is simply a technical shortcut and an alternative to producing first an (for example “native” CommonMark DTD) XML document from the input text, and then transforming this XML document (or parsed content thereof) to LaTeX or whatever target format you have.

Consider for example the character reference syntax in Markdown and CommonMark: it is not simply a “reflection of the initial purpose” of Markdown to generate HTML, but rather essential for representing and entering Unicode characters in a portable way.

And similarly, the easiest way to support the indeed “nice thing” of single-source publication from CommonMark input text is IMO to use application-specific entities and tags directly in CommonMark for stuff outside the feature set of the syntax proper (just like using “raw HTML”, but in no way restricted to HTML!). Much better at least than introducing and using “foreign syntax” for which the transformation into PDF, ePUB or what target format you have takes a whole different route—while the latter is a reasonable alternative if one (post- or pre-)processes this “foreign syntax” into XML, and in this way again transforms “extended” CommonMark into XML.

Both approaches assume a target document type or “target format” which is at least representable in XML (which is practically always the case); and the first approach obviously depends on the Markdown and CommonMark property to support the “raw markup loophole”.

Again, this has nothing to do with HTML or “vestigial” support for it, but is IMO essential for Markdown’s and CommonMark’s generality and usefulness: There simply is no other widely supported, standardized, generalized and extensible markup language than—well, you know the acronyms :wink:

I’m a bit puzzled by you seemingly asserting that the relationship between CommonMark and SGML/XML is basically a historical accident (of John Gruber’s whim) from which we “have grown out of”: on the contrary, I see the core reason for the flexibility and general usefulness of Markdown and CommonMark precisely in that relationship … (Note that I’m not talking about HTML here!)

There are two different things or concepts for which terminology is needed, and which are easy to confuse:

  1. The syntactic construct in the CommonMark input text: here the “list item” starts with say a “-” HYPHEN-MINUS character, typically typed in by our esteemed author him- or herself. Technically, this is a non-terminal symbol of a concrete syntax, and the specification will certainly need to talk about this, and could do so in freely chosen (and hopefully easy to understand) terms;

  2. The “meaning”, or interpretation, or translation result which corresponds to this input text: that one is a more abstract thing, which could be described as a node in some kind of Abstract Syntax Tree, or equivalently as a non-terminal of an abstract syntax, or as (an instance of) an element type from the CommonMark DTD, or as the “parsed content” of such an element, and so on: this is where the “prior art” of having UL and OL list element types applies, and here the only thing that matters is the element’s attributes and content (and it’s “intended meaning”, however fuzzy), but not whether it is rendered with a “bullet” or a “pointing hand” dingbat or a “hypen-minus” or whatever style of marker, or in what font style.

My remarks were all pertinent to the second concept only, and I freely admit that for the first concept there may be better, easier-to-understand, more fitting terms.


Btw, the “unbulleted list” of your project would be the “simple list”, or <SL>, element type in the mentioned “general document” <!DOCTYPE general PUBLIC "ISO 8879:1986//DTD General Document//EN">: also one of the more than 30 years old list types, and also already provided back then in GML on the IBM/360.

If you ask me, this is a good name to use in the documentation. Just sayin’ … :wink:

[Disclaimer: I’m not (quite) as old as I possibly seem here, and I wasn’t around at the time of GML etc, or probably still shat into my diapers during those days …]

@tin-pot, yes, XML is flexible enough for representing the structure of CommonMark documents. But that doesn’t give CommonMark any closer relation to XML than to any of a multitude of other general markup languages that are equally capable of representing this abstract structure.

Anyway, this whole thread is about a simple matter of terminology: should we call these things bullet list or unordered lists or itemized lists or something else? I don’t care too much. If people strongly prefer “unordered list,” I can go with that. But I don’t want to get distracted from the many more substantive issues that still need to be resolved.

1 Like

Most people on this forum seem to share your sentiment.

Let me present the (hopefully) ultimate argument. The spec does not refer to ordered lists as number lists (or numbered lists, or numeric lists), although they are marked exclusively (in core CommonMark) by ASCII decimal digits in both input and output. On the other hand, unordered list items are only rendered with bullet markers (in most formats, at that), but only have -, + or * markers in the input (at least until Unicode bullets are part of the spec). Why bullet lists then?

@tin-pot, I hope my “old terminology” remark hasn’t offended you, and I do apologize if it has. I never meant to imply old == bad (especially since my proposal seems to be in complete accord with SGML), only that terms should be judged on merit alone (proof). Anyway, the older an engineer, the more spec term changes he proposes :wink:

@jgm:

First of all, I’d like to wish you and everybody around here a happy new year!


If I understand your remark correctly, you do include LaTeX, RTF, nroff etc among the “multitude of other general markup languages” here? If that’s the case, then I have a pretty different view on Markdown in general and CommonMark in particular. Mine is based for the most part on the extent to which the Markdown description and CommonMark specification do support/include/refer to/borrow from/rely on “HTML” or generally XML syntax, structure, and notions—which you seem to regard as merely inherited baggage, so to say? After all, none of these “XML support features” are useful for creating any of these “other general markup languages” directly from CommonMark input, right?


But anyway, what really matters though is the definition of CommonMark, as expressed in the specification: and I see no reason why the same specification (more or less as it is right now) couldn’t be “compatible” with both points of view.

[To be precise:
As I said, I’m primarily referring to the document content model of XML/SGML anyway—formally the XML Infoset rsp SGML ESIS as “upper bounds”1) of what CommonMark can express, but the far, far simpler “data model” of µXML would probably suffice too (being isomorphic to what you refer to as the “AST”)—and I’m not so much concerned with the syntax: if this is what you mean by “representing this abstract structure”, then we do largely agree again after all … :wink: And be assured that I certainly don’t want to “require” every CommonMark processor to generate output in XML syntax, ie produce XML documents or fragments!

On the contrary, I think that the CommonMark specification should not prescribe any syntax at all for the representation of the parsing result (the parsed content, the AST, the content model instance, the abstract structure, whatever you call it). For example, a CommonMark parser could well “represent” the result only as a sequence of invocations of SAX-style callback functions, and the manner in which this is done is clearly outside the scope of any CommonMark specification. But still the specification must allow to test and decide of a parser of this kind does in fact parse correctly. The same considerations are behind the XML Infoset and ESIS specifications, which was the reason I pointed to RAST and to canonical XML in the following related discussion:

But it sure would be useful to have a simple but precise notation for the “abstract” parsing result for use in the specification text (and for denoting test case results): that discussion is over there

________

  1. The term “upper bound” is used here in an only “half-sloppy way”, because (any reasonable definition of) an “is-lossless-translatable-to” relation between content models would obviously induce a pre-order among content models (and thus document types in general), in which “upper bound” would have the usual, precise meaning.
    ]

Personally, I’d prefer “itemized list” or—equally good—“unordered list” over “bullet list”, because “bullet” refers to a particular glyph (or family of glyphs), thus this term is in my view too much tied to a presentation style. And also because there is relevant “prior art” (and in LaTeX too) for the term “itemized list”, while the term “bullet list” seems to be used primarily in and around MS Office documentation.


But all in all I don’t care that much either.

However, and more generally, I still think that a distinction in terminology between

  • syntactic parts of input text on the one side and
  • output elements/nodes/subtrees/structural parts of the parsed content (or “AST”) on the other

could be occasionally helpful in the specification, if only to avoid confusion.

@Dmitry:

Haha, don’t worry: rest assured that I didn’t feel offended in the slightest way! But very nice of you to care!

In fact I had a bit of a feeling of being the “old fart” (among youngsters?), when referring to standards and developments from about four decades ago [that’s roughly as old as I am, being around since 1968 …] :wink:


[…] only that terms should be judged on merit alone (proof).

Would you accept as “merit” of a term if that term itself is standardized, in our case for example in ISO 2382-23 Vocabulary – Part 32: Text processing, or if the term is defined in an International Standard related to the subject matter in question, in our case for example ISO 10646?

(I ask just out of curiosity, not that “unordered list” is such a term—unless one counts the mentioned “general document” example DTD as kind-of authoritative, that is—nor any of the proposed alternatives …)


Regarding the remark in the post you refer to above:

However, if consistency with HTML5 terminology is deemed necessary, I’d reluctantly prefer […]

Allow me to add a little rant that in my not-so-humble opinion—from what I have seen and studied of “HTML 5” so far—it rather seems that

    consistency with “HTML 5 terminology” is very much a thing to avoid

because it looks like the designers of HTML5 got carried away by an intense desire to use “new, abstracter, terminology” just for the sake of it. For example, re-naming <HR> to “thematic break” (did they really eliminate the phrase “horizontal rule” from the HTML spec?): this is IMO incredibly silly, if not to say stupid, and there are a many other examples like this (just think of the whole “bogus comment” concept and terminology!).

As you can probably guess, I do not hold HTML5 in high regard, and I would certainly not want to use it as an example of —shudder!—consistent use of terminology!

[ I have to admit though that replacing Adobe Flash by whatever kind of HTML5 streaming video element may in fact be worth the otherwise horribly misguided effort that HTML 5 is in my opinion so far :wink: … ]


Anyway, the older an engineer, the more spec term changes he proposes.

This may well be because an “older engineer” tends to have seen more needless changes in terminology for the same old concept (introduced out of cluelessness, or out of vanity, or because it’s fashionable), and thus tends to prefer using the same (“old”) terms for the same concepts—a principle which is an instance of Occam’s razor, if you think about it. (Which explains part of my opinion about HTML5.)


We seem to agree on the “issue” of terminology though; I would be (pretty much equally) happy with either

  • “ordered list” and “unordered list” (for said “historical” or “traditional” reasons), or

  • “itemized list” instead of “unordered list” (for LaTeX and DocBook precedence), and maybe even “enumerated list” (for LaTeX precedence) instead of “ordered list”.

Both “bullet list” and “numbered list” are very much tied to a specific presentation style, and are thus not good choices

  • either for naming the CommonMark input syntax construct (ie the text of a list item that starts with “-” or “+” or “*”, or “1.” etc: this is “just markup”, and for example using “-” and “*” produces the same result anyway (as do for “ordered list items” the input markups “1.”, “2.”, or “3)”, IIRC);

  • or for naming the parsing result: these lists are “typically” transformed into <UL> or <OL> elements (in HTML), or maybe into <ItemizedList> or <OrderedList> elements (in DocBook), and so forth: but in all these target document types, the rendering style of these lists is solely determined by some kind of applicable style sheet (or the whims of a user agent’s defaults, apart from some fuzzy recommendations for “default styles”). And there are lots of ways to “number” the items in a list!

So in the presentation of the result document,

  • neither a U+2022 “bullet” character (in “unordered list” items)
  • nor a number (in “ordered list” items)

needs to be present, which makes the names “bullet list” and “numbered list” rather obviously not very appropriate for “the output side”, and IMO hence inappropriate for the “input side” too.


To put my point of view bluntly into a simple slogan:

CommonMark is not a style-sheet or formatting language!

If you put a gun to my head, I’d pick “most widely used” over “standardized”. HTML 5 happens to be a standard (or two), and yet

The more familiar I become with the W3C standard, the more I tend agree with this statement.

Didn’t you get the memo? :wink:

Apparently, my command of English is insufficient for me to grasp this contraption. Doesn’t itemized mean consisting of items, or simply a list?

I couldn’t imagine anyone having an issue with the term “ordered list”. Until now. :smile:

In CM the different unordered list markers determine whether an item belongs to an existing list (i.e.

- foo
+ bar

result in two distinct lists), whereas in ordered lists the first marker determines the start index.

As for presentation, the following is a perfectly standard HTML ordered list (the Arabic numerals are only provided for illustration):

甲、1
乙、2
丙、3
丁、4
戊、5
己、6
庚、7
辛、8
壬、9
癸、10

It’s true that “most widely used” is often different from what is “standardized”. And because the whole CommonMark effort is an exercise in standardization (alas without any “officially” sanctioned status, of course), the question turns out to be whether it is a good idea

  • to use the terminology of related standards (widely used or not), or

  • rather try to “elevate” a widely-used term to a pseudo-standardized level, and use it in the CommonMark specification instead of existing terminology.


I haven’t dug very deep into the HTML 5 specifications yet, but what I’ve seen so far did not exactly fill me with awe nor made me shiver with an–tici—pation

Right now I still have the hope that I have severely misunderstood the whole HTML 5 approach, and that I will one day see through and beyond my puzzlement the crystal-clear, well thought-out, forward-looking, both versatile and upwards-compatible specification that HTML 5 is supposed to be.

But don’t hold your breath, I certainly don’t :wink:


Didn’t you get the memo?

I did, and I shook my head in disbelief.


English is a foreign language for me too, but according to what I can deduce from context, “itemizing” here means something like “visually marking [the beginning of] each item in a list”, ie with some “item marker” like a “bullet” (this is consistent with the use of this term in LaTeX and in OASIS DocBook).

But if each item is marked individually, typically with an item number or letter in place of the “marker”, this would constitute an “ordered list”, or “enumerated” list, and would not be called an “itemized” list.

Finally, a “plain” list, where each “list item” is just a paragraph (even if indented) without any sort of “item marker”, would then be a “simple list” (consistent with the use of that term in the ISO 8879 “general document” DTD, and in OASIS DocBook), and would not fall under the definition of “itemized list” I attempted here.

By the way: Please note that it’s not me having an issue with “unordered” and “ordered” list, nor with “itemized” and “enumerated” list. [ I’m having “an issue” with “bullet list” and “numbered list” instead :slight_smile: ]


In CM the different unordered list markers determine whether an item belongs to an existing list

You’re right of course: the specification explicitly says so, and even has an example. I somehow forgot that case (but wouldn’t rely on it being commonly implemented anyway), and consider it more of a kludge (useful to enforce consistent input markup) than being a generally useful feature: how do you “split” a numbered list? Change between "1. ", "2. ", then "1) " and "2) "? Yuk!

A list-related feature which is missing and would be useful IMO would provide a way to continue an “ordered” list, after some intervening stuff “interrupted” the list, like in this faked example:

 1. First Item in list.

Some paragraph.

 2. Second item in list.

But that’s of course a rather different topic.


I’m not sure what you mean by “standard HTML” here, but the HTML markup of your example—from what I can see in my browser—looks like this:

<p>甲、1<br>乙、2<br>丙、3<br>丁、4<br>戊、5<br>己、6<br>庚、7<br>辛、8<br>壬、9<br>癸、10</p>

Do you mean that a user agent would be allowed to present an “ordered list” in the style of your example, that is in a manner where in the “marker box” of each item of the list is a chinese celestial stem1) instead of a decimal or roman or alphabetic numeral?

Of course that’s “allowed” in “perfectly standard” HTML: to start with, for the simple reason that the HTML specification (or for that matter: the DocBook or any other2) document type specification) does not and reasonably can not require a particular rendering style beyond rather general hints about the “meaning” of the various element types.

This is of course in stark contrast to specifications of document formats or formatting languages and so on, where for example a particular LaTeX style or the RTF specification as a whole does constrain the rendering of governed document instances pretty narrowly (I think the same does hold for the OpenOffice XML document format)—right up to page description languages like PDF, SPDL, and PostScript or Microsoft’s XPS / ECMA OpenXPS, which do nothing else but fixing and constraining the “presentation” or rendering of documents.
______

  1. I freely admit that I had to use Google for that one. Hooray for Unicode! :wink:
  2. Except of course document types which are explicitly designed to also convey presentation information (like the mentioned OpenOffice XML, or simpy HTML with embedded CSS style attributes) or are even dedicated to this purpose (like XPS and OpenXPS, which are page description languages).

But what has this to do with the difference between “ordered” lists (where each item is “marked” with an individual “marker”, taken from an ordered set, be it decimal or roman numerals or celestial stems or counting rods or cuneiform numerals or whatnot) and “unordered” (or “itemized”) lists (where each item is “marked” in the same way, say with a “bullet” character)?

I find thematic break a poor choice on at least three levels:

  1. Theme is overloaded with an unrealed meaning in other technologies (WPF/Silverlight/XAML themes are analogous to CSS).
  2. Break is overloaded with a different meaning in both HTML and CommonMark. How does one produce a soft thematic break?
  3. Too many words are used to denote a simple unambiguous concept.

If I were asked to propose an alternative term, my suggestion would be boundary.

Having said that, I do agree that horizontal rule should have been replaced due to its presentational semantics. In fact, an abovementioned project of mine had used *** (or --- or ___) for topic boundaries before CM’s sudden (for me) change in terminology.

I simply fail to see any good reason for rejecting unordered lists while embracing thematic breaks.

1 Like

Maybe it’s my non-native level of English, but I have no problem with calling -, + and * (ASCII) bullets when used as line markers for list items.

I think that whatever is decided should reflect the output more than the input. It is the intention of the writer which matters most - if the writer is intending to write a list, the writer is going to search the Help page with that intention in mind.

Bullet lists imply a certain presentational output, as @Dmitry pointed out. Lists may be rendered without bullets, for example a list may be rendered a grid. The presentation of ordered and unordered lists can be completely swapped around too. Consider this CSS:

ol {
  list-style-type: disc;
}

ul {
  list-style-type: decimal;
}

In this case, decimal numbers will show up in order next to the unordered list items, but this is purely presentational; the numbers don’t carry semantic meaning. Similarly, if the order of your list matters in any way, for example the intention that the “Home” list item in a navigation list should always appear first, then it’s arguably clearer to use an ordered list, styled without numbers if you don’t want these to be visible (e.g. list-style-type: none).

Wow—21 posts already! The bike shed effect seems to be in full flight … [And yes, I know I have my share in this :wink: ]


@chrisalley: I’m a bit confused by you stating whatever is decided should reflect the output more than the input and then going on to—correctly and instructively—pointing out that the output rendering is completely out of the author’s hands. So I assume that should reflect the output does in fact mean the parsing result (aka AST, aka CommonMark DTD element structure).


With which I agree completely, and would thus prefer names which allude to the “two kinds of list” usually found in target document types like HTML, DocBook, “general document”, LATEX and so forth. So far I have only seen these alternative names in common use:

  1. “unordered” and “ordered”: In the ISO 8879 “general document” and HTML DTDs;
  2. “itemized” and “ordered”: In DocBook;
  3. “itemized” and “enumerated”: In LATEX.

In addition, one might take the following “prior art” into account; but I don’t make any claim of completeness for this collection of references.


In contrast to using two (or more) element types for various kinds of lists, the ANSI/NISO Journal Article Tag Set and the (related) ISO Standards Tag Set (ISOSTS) both conflate the two kinds of list into a single element type, <list>, and distinguish them using an attribute list-type with predefined values like "order", "alpha-lower", or “bullet” (sic!). The explanation for “bullet” however reads:

Unordered or bulleted list. Prefix character is a bullet, dash, or other symbol.

So there: “unordered” again. :slight_smile:


And just for completeness: The ISO 12083 Electronic manuscript preparation and markup DTD seems to use the same approach when it comes to lists, but simply employs numbers to indicate the “list type”, together with vague “suggestions” (but note the word “bullet” here again!):

<!-- l.types = Suggestions for list types:
              1=arabic, 2=upper alpha, 3=roman, 4=bullet, 5=dash,
              6=unlabelled; if more needed (e.g. lower alpha)
              modify or extend this list as necessary. -->
<!ENTITY % l.types "(1 | 2 | 3 | 4 | 5 | 6) #IMPLIED" >

[ Although the ATTLIST declaration for the <list> element type does not declare the type attribute using NUMBER as the declared value, but references the above parameter entity as a name token group. ]


And another one, this time from ISO/IEC 26300-1:2015, aka OASIS OpenDocument: there’s only one element type for list, unsurprisingly named <text:list> (this is XML!). Pretty much all rendering properties of such a list seem to be specified in an associated “list style”:

Lists may be numbered. The numbering may be restarted with a specific numbering at each list item. Lists may also continue numbering from other lists in order to merge lists into a single, discontinuous list. Whether list numbering is displayed or not depends on the list style being used. [ISO 26300-1:2015, clause 5.3.1]

This “list style” is determined by the attribute style:name in <text:list>, which may be missing. In this case, clause 5.3.2 gives the following advice:

If a list does not have a style:name attribute and therefore no list style is specified, one of the following actions is taken:

  • If the list is contained in another list, the list style defaults to the style of the surrounding list.

  • If there is no list style specified for the surrounding list, but the list contains paragraphs that have paragraph styles attached that specify a list style, that list style is used.

  • An implementation-dependent default is applied to the list.

To determine which formatting properties are applied to a list, the list level and its style name are taken into account. 16.30.

The “16.30” above refers to clause “16.30 <text:list-style>”, which explains what kind of “style elements” the <text:list-style> element may contain for each “list level”. — I’m too lazy to go into that. (Let alone to dig into Microsoft’s “Office Open XML” stuff …)


To conclude: it really is a mess, and it really does not matter much :wink:

But if one useful thing can be gleamed from these examples, it probably is the approach to use only one element type for lists, and to “customize” all list properties using attributes; and it seems that newer document types tend to go into this direction. Note that this is also the approach taken in the CommonMark DTD, which has a single <list> element type.

My point is that once you take away the entirely mallable presentational aspects of the output (the default being a bulleted list in web browsers, in the case of unordered lists), we’re just left with the semantic structure - that could be HTML elements or some other format. So, the meaning behind that structure is what is important when describing it. The term “Unordered list” is suitable, I think, for describing such a structure, since the order of the list items is not important. If the order of the list is important in some way, it would be clearer markup to use an ordered list (even if, in practice, the order of unordered lists often is intended to have some relevance).

I couldn’t agree more!

The term “Unordered list” is suitable, I think, for describing such a structure, since the order of the list items is not important. If the order of the list is important in some way, it would be clearer markup to use an ordered list (even if, in practice, the order of unordered lists often is intended to have some relevance).

I’m not convinced that “unordered” and “ordered” list is the best terminology, I’m just saying that it is the usual terminology in the—fuzzy delineated—field of “structured document models”, mostly because of the UL and OL list element types in (X)HTML (which date back, as I pointed out, a loooong time).

Regarding the “semantic” difference between these two, and attempts to define them, the in my view most practical description I came about so far is from ISO/IEC/TR 9573:1988 (as I quoted above in my little historical excursion); it goes like this:

  • “ordered” list, where there may be a need to refer to each item in the list. Typically the list items are numberred by the text-formatter.

  • “unordered” list, where there is no need to refer to any item in the list, but where each item should be clearly standing out. Typically the list items are indicated by bullets, stars, or dashes by the text-formatter.

Note that there is no discussion whether the order of items is important, or whether the “meaning” of the document changes should this order be changed (as HTML5 does in the description of <ul> vs <ol>, which is IMO very silly, to put it mildly).

Talking about the “need to refer to each item” seems to be a suitable way to convey the difference between these two list types—after all, the individual markers (ie, numerals) in front of the items in an “ordered” list provide the means to do so (one could say, it is their main purpose), and in a printed document without links to click on: the only generally suitable way to refer to “each item in the list” (other than having the reader count the items by himself when referring to “the 27th item in this unordered list” …).

Note also that these descriptions only talk about presentations that are “typically” employed by a “text-formatter”, taking into account that these element types do not or can not enforce a specific rendering.

If “referenceability” was the distinguishing feature, “ordered”/“unordered” would clearly not be the best labels. They’re the best labels if reordering does matter for one but not the other, as in HTML5 (although that semantic was probably informed by the preexisting names and mnemonics ol/ul).

List items that can be referenced by a human reader need a visible and (at least locally) unique label. Most often this will be numeric (incl. alphabetic), because that’s easily automated, but explicit labels are also common, e.g. \item[label] in Latex. Many languages have a dedicated list type for that, e.g. dl in HTML. The mechanism is employed for more than list items, of course: formulas and equations, figures, tables, code listings, definitions, lemmas, theorems, examples (most of these frequently have a caption) and headings (which are basically captions for chapters and sections).

Possible appropriate name pairs for the semantic distinction in ISO 9573, as quoted, would have been labeled/unlabeled, named/anonymous, numbered/bullet, numbered/itemized etc.
Anyhow, does that really teach us how to choose fitting names for the Commonmark specification? The output language is out of control of the spec. It can be as presentational or as structural as you can imagine. The only thing we know for sure is that (at least currently) one kind of list needs a number and a punctuation character ()/.) for each item, the other kind needs just a punctuation character (-/+/*). To an author, numbered or enumerated list would come natural for the first kind and the second kind could be a unnumbered, itemized or bullet list. The final one applies if authors considered at least + and * ASCII approximations of a bullet character like – Unicode also has , and (!) bullets, so the hyphen - could probably be considered a “bullet” replacement character as well. There may be even better names in English that I’m not aware of.

Well, yes, maybe this would have been more appropriate names. But somehow I see no point in second-guessing the commonly used terminology: and in practice, I have never seen “anonymous list” for example (and I think “labeled” or “unlabeled” or “named” list either).

As I said, I don’t deem “ordered” and “unordered” the best or even the most appropriate names, but only pretty common in the context at hand, which is: “abstract” document structures in the parsed output.

When it comes to—at least equally good—alternatives in “common use”, I have so far only seen “itemized” vs “enumerated” (like in LaTeX, but “itemized” is also used in DocBook); and these would be fine with me, too.

But “bullet”, “named” (I’d rather keep definition lists out of the discussion, as they aren’t available in CommonMark anyway) and “numbered” are inferior terms IMO, primarily for their pretty strong presentational connotations.

There we go! In my point of view, the “output language” in the CommonMark specification is not just “out of control”, but is rather lacking: right now, all the examples show the result of converting the CommonMark input into HTML—and for good reasons that I don’t deny:

  1. Generating HTML is the most common use case for CommonMark, and
  2. without doubt HTML is by far the most popular and well-known document description language, and
  3. last not least has Markdown explicitly been designed as a tool to generate HTML content.

So including many examples of output HTML in the specification for instructional (or even: testing) purposes is certainly a good idea.

However, and this is indeed my personal opinion or taste, and I don’t claim too much universal validity for it, however in my opinion a specification should not rely on examples, in the sense that the specification’s meaning should not change or vanish if some or all examples are removed from it. And this is clearly not the case with the CommonMark specification as it is right now.

To remedy this situation, the specification could describe the parsed result (aka “AST”, aka “element structure”, aka “document model instance”) in a way independent of any concrete “output language” (to nitpick, I would call the latter “output document type” in the case of XML output, or “output document format” in the general case, eg when converting to LaTeX or RTF).

As far as I can see, the only practical (and generally understandable) way to do so would be to refer to the CommonMark DTD, with the understanding that it is not an XML-marked-up text what a CommonMark processor is required to produce, but rather the “abstract content” of a document instance of that DTD: as long as it produces information or an “information set” (which could well be just a sequence of implementation-defined callback invocation!) that is equivalent to this “abstract content”, the processor is conforming.

If this seems puzzling: there are precise meanings of the terms “abstract content”, “document instance”, “information set”, and “equivalent”.