Ordered vs Unordered List

chrisalley · January 2, 2016, 1:05pm

I think that whatever is decided should reflect the output more than the input. It is the intention of the writer which matters most - if the writer is intending to write a list, the writer is going to search the Help page with that intention in mind.

Bullet lists imply a certain presentational output, as @Dmitry pointed out. Lists may be rendered without bullets, for example a list may be rendered a grid. The presentation of ordered and unordered lists can be completely swapped around too. Consider this CSS:

ol {
  list-style-type: disc;
}

ul {
  list-style-type: decimal;
}

In this case, decimal numbers will show up in order next to the unordered list items, but this is purely presentational; the numbers don’t carry semantic meaning. Similarly, if the order of your list matters in any way, for example the intention that the “Home” list item in a navigation list should always appear first, then it’s arguably clearer to use an ordered list, styled without numbers if you don’t want these to be visible (e.g. list-style-type: none).

tin-pot · January 2, 2016, 3:58pm

Wow—21 posts already! The bike shed effect seems to be in full flight … [And yes, I know I have my share in this ]

@chrisalley: I’m a bit confused by you stating whatever is decided should reflect the output more than the input and then going on to—correctly and instructively—pointing out that the output rendering is completely out of the author’s hands. So I assume that should reflect the output does in fact mean the parsing result (aka AST, aka CommonMark DTD element structure).

With which I agree completely, and would thus prefer names which allude to the “two kinds of list” usually found in target document types like HTML, DocBook, “general document”, L^AT_EX and so forth. So far I have only seen these alternative names in common use:

“unordered” and “ordered”: In the ISO 8879 “general document” and HTML DTDs;
“itemized” and “ordered”: In DocBook;
“itemized” and “enumerated”: In L^AT_EX.

In addition, one might take the following “prior art” into account; but I don’t make any claim of completeness for this collection of references.

In contrast to using two (or more) element types for various kinds of lists, the ANSI/NISO Journal Article Tag Set and the (related) ISO Standards Tag Set (ISOSTS) both conflate the two kinds of list into a single element type, <list>, and distinguish them using an attribute list-type with predefined values like "order", "alpha-lower", or “bullet” (sic!). The explanation for “bullet” however reads:

Unordered or bulleted list. Prefix character is a bullet, dash, or other symbol.

So there: “unordered” again.

And just for completeness: The ISO 12083 Electronic manuscript preparation and markup DTD seems to use the same approach when it comes to lists, but simply employs numbers to indicate the “list type”, together with vague “suggestions” (but note the word “bullet” here again!):

<!-- l.types = Suggestions for list types:
              1=arabic, 2=upper alpha, 3=roman, 4=bullet, 5=dash,
              6=unlabelled; if more needed (e.g. lower alpha)
              modify or extend this list as necessary. -->
<!ENTITY % l.types "(1 | 2 | 3 | 4 | 5 | 6) #IMPLIED" >

[ Although the ATTLIST declaration for the <list> element type does not declare the type attribute using NUMBER as the declared value, but references the above parameter entity as a name token group. ]

And another one, this time from ISO/IEC 26300-1:2015, aka OASIS OpenDocument: there’s only one element type for list, unsurprisingly named <text:list> (this is XML!). Pretty much all rendering properties of such a list seem to be specified in an associated “list style”:

Lists may be numbered. The numbering may be restarted with a specific numbering at each list item. Lists may also continue numbering from other lists in order to merge lists into a single, discontinuous list. Whether list numbering is displayed or not depends on the list style being used. [ISO 26300-1:2015, clause 5.3.1]

This “list style” is determined by the attribute style:name in <text:list>, which may be missing. In this case, clause 5.3.2 gives the following advice:

If a list does not have a style:name attribute and therefore no list style is specified, one of the following actions is taken:

If the list is contained in another list, the list style defaults to the style of the surrounding list.

If there is no list style specified for the surrounding list, but the list contains paragraphs that have paragraph styles attached that specify a list style, that list style is used.

An implementation-dependent default is applied to the list.

To determine which formatting properties are applied to a list, the list level and its style name are taken into account. 16.30.

The “16.30” above refers to clause “16.30 <text:list-style>”, which explains what kind of “style elements” the <text:list-style> element may contain for each “list level”. — I’m too lazy to go into that. (Let alone to dig into Microsoft’s “Office Open XML” stuff …)

To conclude: it really is a mess, and it really does not matter much

But if one useful thing can be gleamed from these examples, it probably is the approach to use only one element type for lists, and to “customize” all list properties using attributes; and it seems that newer document types tend to go into this direction. Note that this is also the approach taken in the CommonMark DTD, which has a single <list> element type.

chrisalley · January 3, 2016, 5:35am

My point is that once you take away the entirely mallable presentational aspects of the output (the default being a bulleted list in web browsers, in the case of unordered lists), we’re just left with the semantic structure - that could be HTML elements or some other format. So, the meaning behind that structure is what is important when describing it. The term “Unordered list” is suitable, I think, for describing such a structure, since the order of the list items is not important. If the order of the list is important in some way, it would be clearer markup to use an ordered list (even if, in practice, the order of unordered lists often is intended to have some relevance).

tin-pot · January 3, 2016, 10:04am

I couldn’t agree more!

The term “Unordered list” is suitable, I think, for describing such a structure, since the order of the list items is not important. If the order of the list is important in some way, it would be clearer markup to use an ordered list (even if, in practice, the order of unordered lists often is intended to have some relevance).

I’m not convinced that “unordered” and “ordered” list is the best terminology, I’m just saying that it is the usual terminology in the—fuzzy delineated—field of “structured document models”, mostly because of the UL and OL list element types in (X)HTML (which date back, as I pointed out, a loooong time).

Regarding the “semantic” difference between these two, and attempts to define them, the in my view most practical description I came about so far is from ISO/IEC/TR 9573:1988 (as I quoted above in my little historical excursion); it goes like this:

“ordered” list, where there may be a need to refer to each item in the list. Typically the list items are numberred by the text-formatter.

“unordered” list, where there is no need to refer to any item in the list, but where each item should be clearly standing out. Typically the list items are indicated by bullets, stars, or dashes by the text-formatter.

Note that there is no discussion whether the order of items is important, or whether the “meaning” of the document changes should this order be changed (as HTML5 does in the description of <ul> vs <ol>, which is IMO very silly, to put it mildly).

Talking about the “need to refer to each item” seems to be a suitable way to convey the difference between these two list types—after all, the individual markers (ie, numerals) in front of the items in an “ordered” list provide the means to do so (one could say, it is their main purpose), and in a printed document without links to click on: the only generally suitable way to refer to “each item in the list” (other than having the reader count the items by himself when referring to “the 27th item in this unordered list” …).

Note also that these descriptions only talk about presentations that are “typically” employed by a “text-formatter”, taking into account that these element types do not or can not enforce a specific rendering.

Crissov · January 3, 2016, 11:00am

If “referenceability” was the distinguishing feature, “ordered”/“unordered” would clearly not be the best labels. They’re the best labels if reordering does matter for one but not the other, as in HTML5 (although that semantic was probably informed by the preexisting names and mnemonics ol/ul).

List items that can be referenced by a human reader need a visible and (at least locally) unique label. Most often this will be numeric (incl. alphabetic), because that’s easily automated, but explicit labels are also common, e.g. \item[label] in Latex. Many languages have a dedicated list type for that, e.g. dl in HTML. The mechanism is employed for more than list items, of course: formulas and equations, figures, tables, code listings, definitions, lemmas, theorems, examples (most of these frequently have a caption) and headings (which are basically captions for chapters and sections).

Possible appropriate name pairs for the semantic distinction in ISO 9573, as quoted, would have been labeled/unlabeled, named/anonymous, numbered/bullet, numbered/itemized etc.
Anyhow, does that really teach us how to choose fitting names for the Commonmark specification? The output language is out of control of the spec. It can be as presentational or as structural as you can imagine. The only thing we know for sure is that (at least currently) one kind of list needs a number and a punctuation character ()/.) for each item, the other kind needs just a punctuation character (-/+/*). To an author, numbered or enumerated list would come natural for the first kind and the second kind could be a unnumbered, itemized or bullet list. The final one applies if authors considered at least + and * ASCII approximations of a bullet character like • – Unicode also has ‣, ◦ and ⁃(!) bullets, so the hyphen - could probably be considered a “bullet” replacement character as well. There may be even better names in English that I’m not aware of.

tin-pot · January 3, 2016, 11:39am

Well, yes, maybe this would have been more appropriate names. But somehow I see no point in second-guessing the commonly used terminology: and in practice, I have never seen “anonymous list” for example (and I think “labeled” or “unlabeled” or “named” list either).

As I said, I don’t deem “ordered” and “unordered” the best or even the most appropriate names, but only pretty common in the context at hand, which is: “abstract” document structures in the parsed output.

When it comes to—at least equally good—alternatives in “common use”, I have so far only seen “itemized” vs “enumerated” (like in LaTeX, but “itemized” is also used in DocBook); and these would be fine with me, too.

But “bullet”, “named” (I’d rather keep definition lists out of the discussion, as they aren’t available in CommonMark anyway) and “numbered” are inferior terms IMO, primarily for their pretty strong presentational connotations.

There we go! In my point of view, the “output language” in the CommonMark specification is not just “out of control”, but is rather lacking: right now, all the examples show the result of converting the CommonMark input into HTML—and for good reasons that I don’t deny:

Generating HTML is the most common use case for CommonMark, and
without doubt HTML is by far the most popular and well-known document description language, and
last not least has Markdown explicitly been designed as a tool to generate HTML content.

So including many examples of output HTML in the specification for instructional (or even: testing) purposes is certainly a good idea.

However, and this is indeed my personal opinion or taste, and I don’t claim too much universal validity for it, however in my opinion a specification should not rely on examples, in the sense that the specification’s meaning should not change or vanish if some or all examples are removed from it. And this is clearly not the case with the CommonMark specification as it is right now.

To remedy this situation, the specification could describe the parsed result (aka “AST”, aka “element structure”, aka “document model instance”) in a way independent of any concrete “output language” (to nitpick, I would call the latter “output document type” in the case of XML output, or “output document format” in the general case, eg when converting to LaTeX or RTF).

As far as I can see, the only practical (and generally understandable) way to do so would be to refer to the CommonMark DTD, with the understanding that it is not an XML-marked-up text what a CommonMark processor is required to produce, but rather the “abstract content” of a document instance of that DTD: as long as it produces information or an “information set” (which could well be just a sequence of implementation-defined callback invocation!) that is equivalent to this “abstract content”, the processor is conforming.

If this seems puzzling: there are precise meanings of the terms “abstract content”, “document instance”, “information set”, and “equivalent”.