Ordered vs Unordered List

Dmitry · December 24, 2015, 5:33am

Contrary to emphasis, the term bullet list bears no ambiguity. However, similarly to horizontal rule RIP, it implies certain presentational semantics.

I therefore propose to replace all occurrences of bullet list with unordered list.

Crissov · December 24, 2015, 8:06pm

Nah, “unordered” isn’t really any better, because often list items are in fact pre-ordered, but often in an arbitrary, non-immanent way (e.g. alphabetic collation). Likewise, many “ordered” lists are enumerated without any particular reason.

Dmitry · December 24, 2015, 10:08pm

According to W3C:

The ul element represents an unordered list of items; that is, a list in which changing the order of the items would not change the meaning of list.

Crissov · December 28, 2015, 4:54pm

Sure, that’s ul in HTML, but I doubt people are actually using bullet lists in Markdown that way at all times.

Dmitry · December 29, 2015, 1:09pm

I believe 99.9% of the people use emphases to make text underlined and bold, and yet we call those emphasis (ambiguously) and strong emphasis.

The only valid counter-argument I’ve seen so far came from my browser’s spellchecker, which persistently claims that unordered is not an English word (neither is inline). As I don’t feel full confidence in this regard, any comments from native speakers would be herzlich wilkommen.

tin-pot · December 31, 2015, 12:16pm

The `UL` and `OL` lists

[…] “unordered” isn’t really any better, because […]

We can certainly try to come up with better names for the UL and OL list types, but I really can’t see any merit in it: this seems rather like an attempt to rewrite history.

As far as I can reconstruct, the GIs LI, UL, and OL (where UL means “unordered list” and OL means “ordered list”) have been around for more than 30 years now (probably rather 40 years or more), and the history goes like something this:

The 70’s – Before SGML

Even before SGML, the precursor GML at IBM had “unordered” and “ordered” lists, and used UL and OL for the corresponding “tags”.

A hands-on description of the difference in the IBM documentation reads like this:

Unordered lists are similar to simple lists, except that each item in an unordered list is preceded by a special symbol. (The special symbol used depends on the device on which you are having your output printed.) You would use an unordered list when the items in the list are fairly long, maybe even many paragraphs, but don’t need to be in a specific sequence.

And for OL:

And then there’s the ordered list. Use an ordered list when the items you’re listing need to be in a specific sequence. An example of an ordered list follows. […]
When you create an ordered list, you don’t have to number the items yourself. The starter set does it for you. This saves you a lot of work when you decide to insert, delete, or rearrange items.

(The “starter set” is the “GML starter set (GMLSS)” product.)

The 80’s – SGML

Many of the GMLSS elements were adopted into the “general document” DTD example in annex E of ISO 8879:1986 (the very first SGML standard). As this DTD is in turn the very first “official”, publicly available, “document type”, one can truly say that LI, UL, and OL were already in use when “angle-bracket-tags” were invented and standardized.

The “general document” DTD has actually six kinds of list, of which only three (OL, UL, and DL—but not SL, NL, and GL) later made it into HTML. Regarding the “ordered” vs “unordered” meaning, the “user manual” for SGML, ISO/IEC/TR 9573:1988, describes the difference like this (in section 5.3.10):

The “general” document includes six types of lists:

“ordered” list, where there may be a need to refer to each item in the list. Typically the list items are numberred by the text-formatter.

“unordered” list, where there is no need to refer to any item in the list, but where each item should be clearly standing out. Typically the list items are indicated by bullets, stars, or dashes by the text-formatter.

[…]

I find these descriptions perfectly reasonable and not too presentation-biased. (The DTD has a LIREF element type dedicated to referencing list items, which is declared empty, so the intended meaning of “where there may be a need to refer to each item” seems to be that LIREF can be used to refer to OL list items, but should not or even can not be used to refer to UL items—although these can have an ID attribute too.)

The 90’s – Early WWW and HTML

These “types” of lists were then already inherited (via the AAP tag set) in the very first HTML sketches by Tim Barners-Lee. Note that in this 1992 specimen there is also the XMP element type from the “general document” DTD, with roughly the use and purpose of the later PRE and CODE element types. And the tags HP1, HP2, etc—also from the “general document” DTD—for highlighted phrases are also mentioned (qualified as “not currently used”) in early descriptions of HTML.

The HTML 2 specification dated 1995-09-22 used a different wording, leaning more to the “presentational” side, for the two list types. (I could not find earlier HTML descriptions than version 2.0;)

The 90’s – Stable HTML

Then HTML 3 added “customization” attributes for list elements, so that one can choose the “marker” style for list items. One more move in the “presentational markup” direction, in the absence of CSS or similar techniques. Consequently, the specification does not even bother to explicitly define any “semantic” differences between “ordered” or “unordered” lists, but only explains the rendering differences instead.

The later HTML 4.01 specification is again more explicit, and gives a distinction which is not purely presentational:

An ordered list, created using the OL element, should contain information where order should be emphasized, as in a recipe: […]

But the only “real” difference is again presentational:

Ordered and unordered lists are rendered in an identical manner except that visual user agents number ordered list items. User agents may present those numbers in a variety of ways. Unordered list items are not numbered.

The 2010’s – Shiny new HTML 5

It was only in HTML 5 that the—IMO silly—attempt to define a semantic difference was made in this manner:

The OL element represents a list of items, where the items have been intentionally ordered, such that changing the order would change the meaning of the document.

And accordingly, for UL:

The UL element represents a list of items, where the order of the items is not important — that is, where changing the order would not materially change the meaning of the document.

Taking these descriptions seriously would IMO mean that a user agent or any other application is free to re-order UL items, say alphabetically. This has always been the case for the order or attribute specification lists, but I doubt that many authors would be happy if their UL items were randomly re-ordered with the excuse that this “does not change the meaning of the document” …

To conclude this little excursion:

The terms “ordered list” and “unordered list” have been in use for literally decades,
but with quite some variation in explicitly or implicitly defined meaning.
But the pseudo-precise distinction made in HTML 5 based on “re-ordering does/does not change the meaning of the document” is IMO the least useful definition for these terms (what is the “meaning of a document”, after all?).

So probably the best thing to do in a CommonMark specification is to just talk about “ordered list” and “unordered list”, and leave the meaning of these terms implicitly defined by the fourty-year long common usage of these words …

Crissov · December 31, 2015, 2:14pm

Latex uses enumerate and itemize (and description) – so what?
Why should HTML or *gasp* SGML terminology matter so much (more) for CM?

Since CM authors need to type either a bullet marker +/*/- or a number followed by an ordinal marker ./), I think bullet list item and enumerated or enumeration list item make a lot of inherent sense. The lists consisting of such items would naturally be called bullet lists and enumerated/enumeration lists.

tin-pot · December 31, 2015, 6:53pm

Why should HTML or gasp SGML terminology matter so much (more) for CM?

Mostly because XML/SGML “technology” (note that “HTML” is a different category!) does matter so much for CM.

I think this and related questions boil down to the decision on how much the CommonMark specification and syntax should be tied to:

one or another version of HTML and the terminology used in the particular HTML specification, and
the markup syntax and structural concepts of XML and SGML in general.

I certainly do agree that CommonMark should be specified in the most general terms and as independently from any particular document type (like HTML) as possible, in order to allow a wide range of use scenarios and “target” document types; and in fact the specification right now has IMO already (or still) too much “HTML bias”.

I would even prefer to define CommonMark completely in terms of its own, “native”, document model, for example using the CommonMark DTD. Mapping this into HTML in a “canonical” way is trivial to specify anyway.

But I just see no good reason for nor a reasonable way of making CommonMark independent from the the XML/SGML markup languages¹⁾ in general, for the simple reason that the sole purpose of Markdown and hence CommonMark was and is to provide an “author-friendly” syntax for creating HTML/XML/SGML content.

Just note how much the specification is concerned with character references and “raw HTML”, including comment declaration, processing instructions, markup declarations! Without these “loopholes”, you’d end up with either a very limited syntax subset with very restricted expressive power, or with using whatever other syntax (maybe L_AT_EX, maybe eqn and tbl, or whatever) for the “missing features” of CommonMark.

And it was indeed an explicit design choice (by Gruber) for Markdown to allow the use of “raw” HTML syntax, which means from my point of view, and by extension: allowing to use “raw” XML/SGML syntax for these “missing features”.

It sure would be interesting to design a Markdown syntax variant without any recurrence to even XML/SGML, but CommonMark is not it; and this would basically require to come up with a complete new (extensible!) markup language: which amounts to a complete alternative syntax for XML.

As always, you can process the content authored in CommonMark syntax in any way and transform it into any format you like, even without technically and explicitly generating a marked-up XML document in between: but this still would not make CommonMark unrelated to XML/SGML, wouldn’t you agree?

So why should the CommonMark specification suddenly avoid terms—in particular names for document elements which have been in common use even long before HTML was inventend—from precisely this field of markup languages, and start borrowing eg from L_AT_EX, terminology? Or nroff? Or RTF? Or SCRIBE, or OpenOffice, or MS Word and so on?

That said, I don’t think the choice between “ordered list” and “enumerated list” rsp “unordered list” and “itemized list” is very important, or would really matter much or would enhance or degrade the spec in any significant way—beyond that your preferred terms feel rather arbitrary and “foreign” in this context, in my opinion. (I’m not saying that one choice is “better” than the other!)

But I do think the fundamental question discussed above is in fact an important one for CommonMark.

More precisely: the syntax of XML respectively the reference concrete syntax of SGML, and the corresponding content model (a small subset of the XML Infoset rsp SGML ESIS).

jgm · December 31, 2015, 7:20pm

This was, I think, the sole purpose of Markdown originally, as John Gruber conceived of it. But it is certainly not the sole purpose of Markdown variants today, and it’s certainly not the sole purpose of CommonMark.

Indeed, I think one of the nicest things about Markdown/CommonMark is the ability to have a single source document that can be rendered with perfect accuracy into many different formats. I regularly convert Markdown to PDF (via LaTeX), man pages, HTML, EPUBs, and Word documents. I know others who convert Markdown to ICML for import into page layout software. Markdown is rooted in an HTML-generating script, and the ability to include raw HTML reflects that initial purpose. But the idea that Markdown is primarily useful as a way of generating HTML is something we’ve grown out of.

Dmitry · December 31, 2015, 8:15pm

How many of these (excluding Markdown) have bullet lists? I know Word does (since 1983), but how about the others?

I believe CommonMark should use the most accurate terminology, not the most common (no pun intended) or, for that matter, the oldest. A project I’m working on has a format (not invented by me) that defines an unbulleted list. Is that a special case of bullet list? Should the project’s documentation refer to unbulleted bullet lists?

tin-pot · December 31, 2015, 9:47pm

Indeed, I think one of the nicest things about Markdown/CommonMark is the ability to have a single source document that can be rendered with perfect accuracy into many different formats.

I completely agree, and isn’t this also the nicest thing about XML (and SGML) too? And doesn’t Markdown/CommonMark basically inherit this feat from there? You don’t generate ePUB (XHMTL) from cmark’s LaTeX output, do you?

But the idea that Markdown is primarily useful as a way of generating HTML is something we’ve grown out of.

Absolutely right: that’s why I’d prefer to minimize CommonMark’s reliance on HTML specific character names, element type names, syntactic peculiarities and so on.

But even if you convert CommonMark “directly” into LaTeX (via cmark say): this is simply a technical shortcut and an alternative to producing first an (for example “native” CommonMark DTD) XML document from the input text, and then transforming this XML document (or parsed content thereof) to LaTeX or whatever target format you have.

Consider for example the character reference syntax in Markdown and CommonMark: it is not simply a “reflection of the initial purpose” of Markdown to generate HTML, but rather essential for representing and entering Unicode characters in a portable way.

And similarly, the easiest way to support the indeed “nice thing” of single-source publication from CommonMark input text is IMO to use application-specific entities and tags directly in CommonMark for stuff outside the feature set of the syntax proper (just like using “raw HTML”, but in no way restricted to HTML!). Much better at least than introducing and using “foreign syntax” for which the transformation into PDF, ePUB or what target format you have takes a whole different route—while the latter is a reasonable alternative if one (post- or pre-)processes this “foreign syntax” into XML, and in this way again transforms “extended” CommonMark into XML.

Both approaches assume a target document type or “target format” which is at least representable in XML (which is practically always the case); and the first approach obviously depends on the Markdown and CommonMark property to support the “raw markup loophole”.

Again, this has nothing to do with HTML or “vestigial” support for it, but is IMO essential for Markdown’s and CommonMark’s generality and usefulness: There simply is no other widely supported, standardized, generalized and extensible markup language than—well, you know the acronyms

I’m a bit puzzled by you seemingly asserting that the relationship between CommonMark and SGML/XML is basically a historical accident (of John Gruber’s whim) from which we “have grown out of”: on the contrary, I see the core reason for the flexibility and general usefulness of Markdown and CommonMark precisely in that relationship … (Note that I’m not talking about HTML here!)

tin-pot · December 31, 2015, 10:20pm

There are two different things or concepts for which terminology is needed, and which are easy to confuse:

The syntactic construct in the CommonMark input text: here the “list item” starts with say a “-” HYPHEN-MINUS character, typically typed in by our esteemed author him- or herself. Technically, this is a non-terminal symbol of a concrete syntax, and the specification will certainly need to talk about this, and could do so in freely chosen (and hopefully easy to understand) terms;
The “meaning”, or interpretation, or translation result which corresponds to this input text: that one is a more abstract thing, which could be described as a node in some kind of Abstract Syntax Tree, or equivalently as a non-terminal of an abstract syntax, or as (an instance of) an element type from the CommonMark DTD, or as the “parsed content” of such an element, and so on: this is where the “prior art” of having UL and OL list element types applies, and here the only thing that matters is the element’s attributes and content (and it’s “intended meaning”, however fuzzy), but not whether it is rendered with a “bullet” or a “pointing hand” dingbat or a “hypen-minus” or whatever style of marker, or in what font style.

My remarks were all pertinent to the second concept only, and I freely admit that for the first concept there may be better, easier-to-understand, more fitting terms.

Btw, the “unbulleted list” of your project would be the “simple list”, or <SL>, element type in the mentioned “general document” <!DOCTYPE general PUBLIC "ISO 8879:1986//DTD General Document//EN">: also one of the more than 30 years old list types, and also already provided back then in GML on the IBM/360.

If you ask me, this is a good name to use in the documentation. Just sayin’ …

[Disclaimer: I’m not (quite) as old as I possibly seem here, and I wasn’t around at the time of GML etc, or probably still shat into my diapers during those days …]

jgm · January 1, 2016, 4:52am

@tin-pot, yes, XML is flexible enough for representing the structure of CommonMark documents. But that doesn’t give CommonMark any closer relation to XML than to any of a multitude of other general markup languages that are equally capable of representing this abstract structure.

Anyway, this whole thread is about a simple matter of terminology: should we call these things bullet list or unordered lists or itemized lists or something else? I don’t care too much. If people strongly prefer “unordered list,” I can go with that. But I don’t want to get distracted from the many more substantive issues that still need to be resolved.

Dmitry · January 1, 2016, 1:24pm

Most people on this forum seem to share your sentiment.

Let me present the (hopefully) ultimate argument. The spec does not refer to ordered lists as number lists (or numbered lists, or numeric lists), although they are marked exclusively (in core CommonMark) by ASCII decimal digits in both input and output. On the other hand, unordered list items are only rendered with bullet markers (in most formats, at that), but only have -, + or * markers in the input (at least until Unicode bullets are part of the spec). Why bullet lists then?

@tin-pot, I hope my “old terminology” remark hasn’t offended you, and I do apologize if it has. I never meant to imply old == bad (especially since my proposal seems to be in complete accord with SGML), only that terms should be judged on merit alone (proof). Anyway, the older an engineer, the more spec term changes he proposes

tin-pot · January 1, 2016, 1:36pm

@jgm:

First of all, I’d like to wish you and everybody around here a happy new year!

If I understand your remark correctly, you do include LaTeX, RTF, nroff etc among the “multitude of other general markup languages” here? If that’s the case, then I have a pretty different view on Markdown in general and CommonMark in particular. Mine is based for the most part on the extent to which the Markdown description and CommonMark specification do support/include/refer to/borrow from/rely on “HTML” or generally XML syntax, structure, and notions—which you seem to regard as merely inherited baggage, so to say? After all, none of these “XML support features” are useful for creating any of these “other general markup languages” directly from CommonMark input, right?

But anyway, what really matters though is the definition of CommonMark, as expressed in the specification: and I see no reason why the same specification (more or less as it is right now) couldn’t be “compatible” with both points of view.

[To be precise:
As I said, I’m primarily referring to the document content model of XML/SGML anyway—formally the XML Infoset rsp SGML ESIS as “upper bounds”¹⁾ of what CommonMark can express, but the far, far simpler “data model” of µXML would probably suffice too (being isomorphic to what you refer to as the “AST”)—and I’m not so much concerned with the syntax: if this is what you mean by “representing this abstract structure”, then we do largely agree again after all … And be assured that I certainly don’t want to “require” every CommonMark processor to generate output in XML syntax, ie produce XML documents or fragments!

On the contrary, I think that the CommonMark specification should not prescribe any syntax at all for the representation of the parsing result (the parsed content, the AST, the content model instance, the abstract structure, whatever you call it). For example, a CommonMark parser could well “represent” the result only as a sequence of invocations of SAX-style callback functions, and the manner in which this is done is clearly outside the scope of any CommonMark specification. But still the specification must allow to test and decide of a parser of this kind does in fact parse correctly. The same considerations are behind the XML Infoset and ESIS specifications, which was the reason I pointed to RAST and to canonical XML in the following related discussion:

But it sure would be useful to have a simple but precise notation for the “abstract” parsing result for use in the specification text (and for denoting test case results): that discussion is over there

________

The term “upper bound” is used here in an only “half-sloppy way”, because (any reasonable definition of) an “is-lossless-translatable-to” relation between content models would obviously induce a pre-order among content models (and thus document types in general), in which “upper bound” would have the usual, precise meaning.
]

Personally, I’d prefer “itemized list” or—equally good—“unordered list” over “bullet list”, because “bullet” refers to a particular glyph (or family of glyphs), thus this term is in my view too much tied to a presentation style. And also because there is relevant “prior art” (and in LaTeX too) for the term “itemized list”, while the term “bullet list” seems to be used primarily in and around MS Office documentation.

But all in all I don’t care that much either.

However, and more generally, I still think that a distinction in terminology between

syntactic parts of input text on the one side and
output elements/nodes/subtrees/structural parts of the parsed content (or “AST”) on the other

could be occasionally helpful in the specification, if only to avoid confusion.

tin-pot · January 1, 2016, 3:26pm

@Dmitry:

Haha, don’t worry: rest assured that I didn’t feel offended in the slightest way! But very nice of you to care!

In fact I had a bit of a feeling of being the “old fart” (among youngsters?), when referring to standards and developments from about four decades ago [that’s roughly as old as I am, being around since 1968 …]

[…] only that terms should be judged on merit alone (proof).

Would you accept as “merit” of a term if that term itself is standardized, in our case for example in ISO 2382-23 Vocabulary – Part 32: Text processing, or if the term is defined in an International Standard related to the subject matter in question, in our case for example ISO 10646?

(I ask just out of curiosity, not that “unordered list” is such a term—unless one counts the mentioned “general document” example DTD as kind-of authoritative, that is—nor any of the proposed alternatives …)

Regarding the remark in the post you refer to above:

However, if consistency with HTML5 terminology is deemed necessary, I’d reluctantly prefer […]

Allow me to add a little rant that in my not-so-humble opinion—from what I have seen and studied of “HTML 5” so far—it rather seems that

consistency with “HTML 5 terminology” is very much a thing to avoid

because it looks like the designers of HTML5 got carried away by an intense desire to use “new, abstracter, terminology” just for the sake of it. For example, re-naming <HR> to “thematic break” (did they really eliminate the phrase “horizontal rule” from the HTML spec?): this is IMO incredibly silly, if not to say stupid, and there are a many other examples like this (just think of the whole “bogus comment” concept and terminology!).

As you can probably guess, I do not hold HTML5 in high regard, and I would certainly not want to use it as an example of —shudder!—consistent use of terminology!

[ I have to admit though that replacing Adobe Flash by whatever kind of HTML5 streaming video element may in fact be worth the otherwise horribly misguided effort that HTML 5 is in my opinion so far … ]

Anyway, the older an engineer, the more spec term changes he proposes.

This may well be because an “older engineer” tends to have seen more needless changes in terminology for the same old concept (introduced out of cluelessness, or out of vanity, or because it’s fashionable), and thus tends to prefer using the same (“old”) terms for the same concepts—a principle which is an instance of Occam’s razor, if you think about it. (Which explains part of my opinion about HTML5.)

We seem to agree on the “issue” of terminology though; I would be (pretty much equally) happy with either

“ordered list” and “unordered list” (for said “historical” or “traditional” reasons), or
“itemized list” instead of “unordered list” (for LaTeX and DocBook precedence), and maybe even “enumerated list” (for LaTeX precedence) instead of “ordered list”.

Both “bullet list” and “numbered list” are very much tied to a specific presentation style, and are thus not good choices

either for naming the CommonMark input syntax construct (ie the text of a list item that starts with “-” or “+” or “*”, or “1.” etc: this is “just markup”, and for example using “-” and “*” produces the same result anyway (as do for “ordered list items” the input markups “1.”, “2.”, or “3)”, IIRC);
or for naming the parsing result: these lists are “typically” transformed into <UL> or <OL> elements (in HTML), or maybe into <ItemizedList> or <OrderedList> elements (in DocBook), and so forth: but in all these target document types, the rendering style of these lists is solely determined by some kind of applicable style sheet (or the whims of a user agent’s defaults, apart from some fuzzy recommendations for “default styles”). And there are lots of ways to “number” the items in a list!

So in the presentation of the result document,

neither a U+2022 “bullet” character (in “unordered list” items)
nor a number (in “ordered list” items)

needs to be present, which makes the names “bullet list” and “numbered list” rather obviously not very appropriate for “the output side”, and IMO hence inappropriate for the “input side” too.

To put my point of view bluntly into a simple slogan:

CommonMark is not a style-sheet or formatting language!

Dmitry · January 1, 2016, 4:30pm

If you put a gun to my head, I’d pick “most widely used” over “standardized”. HTML 5 happens to be a standard (or two), and yet

The more familiar I become with the W3C standard, the more I tend agree with this statement.

Didn’t you get the memo?

Apparently, my command of English is insufficient for me to grasp this contraption. Doesn’t itemized mean consisting of items, or simply a list?

I couldn’t imagine anyone having an issue with the term “ordered list”. Until now.

In CM the different unordered list markers determine whether an item belongs to an existing list (i.e.

- foo
+ bar

result in two distinct lists), whereas in ordered lists the first marker determines the start index.

As for presentation, the following is a perfectly standard HTML ordered list (the Arabic numerals are only provided for illustration):

甲、1
乙、2
丙、3
丁、4
戊、5
己、6
庚、7
辛、8
壬、9
癸、10

tin-pot · January 1, 2016, 6:24pm

It’s true that “most widely used” is often different from what is “standardized”. And because the whole CommonMark effort is an exercise in standardization (alas without any “officially” sanctioned status, of course), the question turns out to be whether it is a good idea

to use the terminology of related standards (widely used or not), or
rather try to “elevate” a widely-used term to a pseudo-standardized level, and use it in the CommonMark specification instead of existing terminology.

I haven’t dug very deep into the HTML 5 specifications yet, but what I’ve seen so far did not exactly fill me with awe nor made me shiver with an–tici—pation …

Right now I still have the hope that I have severely misunderstood the whole HTML 5 approach, and that I will one day see through and beyond my puzzlement the crystal-clear, well thought-out, forward-looking, both versatile and upwards-compatible specification that HTML 5 is supposed to be.

But don’t hold your breath, I certainly don’t

Didn’t you get the memo?

I did, and I shook my head in disbelief.

English is a foreign language for me too, but according to what I can deduce from context, “itemizing” here means something like “visually marking [the beginning of] each item in a list”, ie with some “item marker” like a “bullet” (this is consistent with the use of this term in LaTeX and in OASIS DocBook).

But if each item is marked individually, typically with an item number or letter in place of the “marker”, this would constitute an “ordered list”, or “enumerated” list, and would not be called an “itemized” list.

Finally, a “plain” list, where each “list item” is just a paragraph (even if indented) without any sort of “item marker”, would then be a “simple list” (consistent with the use of that term in the ISO 8879 “general document” DTD, and in OASIS DocBook), and would not fall under the definition of “itemized list” I attempted here.

By the way: Please note that it’s not me having an issue with “unordered” and “ordered” list, nor with “itemized” and “enumerated” list. [ I’m having “an issue” with “bullet list” and “numbered list” instead ]

In CM the different unordered list markers determine whether an item belongs to an existing list

You’re right of course: the specification explicitly says so, and even has an example. I somehow forgot that case (but wouldn’t rely on it being commonly implemented anyway), and consider it more of a kludge (useful to enforce consistent input markup) than being a generally useful feature: how do you “split” a numbered list? Change between "1. ", "2. ", then "1) " and "2) "? Yuk!

A list-related feature which is missing and would be useful IMO would provide a way to continue an “ordered” list, after some intervening stuff “interrupted” the list, like in this faked example:

1. First Item in list.

Some paragraph.

2. Second item in list.

But that’s of course a rather different topic.

I’m not sure what you mean by “standard HTML” here, but the HTML markup of your example—from what I can see in my browser—looks like this:

<p>甲、1<br>乙、2<br>丙、3<br>丁、4<br>戊、5<br>己、6<br>庚、7<br>辛、8<br>壬、9<br>癸、10</p>

Do you mean that a user agent would be allowed to present an “ordered list” in the style of your example, that is in a manner where in the “marker box” of each item of the list is a chinese celestial stem¹⁾ instead of a decimal or roman or alphabetic numeral?

Of course that’s “allowed” in “perfectly standard” HTML: to start with, for the simple reason that the HTML specification (or for that matter: the DocBook or any other²⁾ document type specification) does not and reasonably can not require a particular rendering style beyond rather general hints about the “meaning” of the various element types.

This is of course in stark contrast to specifications of document formats or formatting languages and so on, where for example a particular LaTeX style or the RTF specification as a whole does constrain the rendering of governed document instances pretty narrowly (I think the same does hold for the OpenOffice XML document format)—right up to page description languages like PDF, SPDL, and PostScript or Microsoft’s XPS / ECMA OpenXPS, which do nothing else but fixing and constraining the “presentation” or rendering of documents.
______

I freely admit that I had to use Google for that one. Hooray for Unicode!
Except of course document types which are explicitly designed to also convey presentation information (like the mentioned OpenOffice XML, or simpy HTML with embedded CSS style attributes) or are even dedicated to this purpose (like XPS and OpenXPS, which are page description languages).

But what has this to do with the difference between “ordered” lists (where each item is “marked” with an individual “marker”, taken from an ordered set, be it decimal or roman numerals or celestial stems or counting rods or cuneiform numerals or whatnot) and “unordered” (or “itemized”) lists (where each item is “marked” in the same way, say with a “bullet” character)?

Dmitry · January 2, 2016, 4:25am

I find thematic break a poor choice on at least three levels:

Theme is overloaded with an unrealed meaning in other technologies (WPF/Silverlight/XAML themes are analogous to CSS).
Break is overloaded with a different meaning in both HTML and CommonMark. How does one produce a soft thematic break?
Too many words are used to denote a simple unambiguous concept.

If I were asked to propose an alternative term, my suggestion would be boundary.

Having said that, I do agree that horizontal rule should have been replaced due to its presentational semantics. In fact, an abovementioned project of mine had used *** (or --- or ___) for topic boundaries before CM’s sudden (for me) change in terminology.

I simply fail to see any good reason for rejecting unordered lists while embracing thematic breaks.

Crissov · January 2, 2016, 9:55am

Maybe it’s my non-native level of English, but I have no problem with calling -, + and * (ASCII) bullets when used as line markers for list items.