Alternative (1) ordered list syntax

Sounds about right, and it keeps to the idea of “no surprises”. Though if we do make a commonmark embedded/lite version, we should only support 1. , 2. , 3. etc… format

I like (1) style support. It’s good for letters and should not produce seruois conflicts in real life.

You are the implementor so you can tell us even better if is a problem to add it, if you consider it ok (and I guess would be easy to add it as pluggable extension and benchmark the possible slowdown) I’m fine with it.

Still I’m more interested in having the table extension fleshed in.

Technically - don’t see problems at first glance. The only question is logic collision with autoreplace rules like (c) -> © and so on. But i think, it’s very rare case, when user really need such autoreplace in the begining of string. Letter lists will be used much more often. May be things like (a) are a bit boring, but a) and a. are too unsafe to be on by default.

My approach is to be minimalistic: just 1 mean to do 1 thing.

1. works quite fine to mark a ordered list so I’d keep that in the core commonmark.

Adding this as extension would be fine, as you said (a) and a. could have higher chances to collision so would be good being able to turn on/off this.

I have never, ever seen anyone use (1) to make a list on Stack Overflow, or any Stack Exchange site, or Discourse.

But I see people try (and fail) to use 1) to make lists pretty much every single day.

3 Likes

+++ codinghorror [Oct 09 14 04:55 ]:

I have never, ever seen anyone use (1) to make a list on Stack Overflow, or any Stack Exchange site, or Discourse.

But I see people try (and fail) to use 1) to make lists pretty much every single day.

Perhaps the (1) style is less common. But remember, techy web forums are not the only users to consider. Lots of people are using Markdown now for scholarly writing, legal writing, journalism, etc. And (1) is used in many contexts. In linguistics, for example, numbered examples are nearly almost numbered that way: here’s one example. This style is right there in the Chicago Manual of Style alongside the others. I use it all the time. Here is an example from the Chicago Manual of Style illustrating various styles of list numbering that are in use:

 I. Historical introduction
II. Dentition in various groups of vertebrates
     A. Reptilia
          1. Histology and development of reptilian teeth
          2. Survey of forms
     B. Mammalia
          1. Histology and development of mammalian teeth
          2. Survey of forms
              a) Primates
                   (1) Lemuroidea
                   (2) Anthropoidea
                         (a) Platyrrhini
                         (b) Catarrhini
                               i) Cercopithecidae
                              ii) Pongidae
               b) Carnivora
                   (1) Creodonta
                   (2) Fissipedia
                         (a) Ailuroidea
                         (b) Arctoidea
                   (3) Pinnipedia
              c) Etc. . . .
3 Likes

I haven’t seen any strong objections to (1) numbered lists so far (letter ordered lists are a different story), but @jgm and others have listed reasons for the inclusion of (1). If they are too much bloat (or too uncommon) for the core, (1) style ordered lists would make a worthwhile extension.

IMO the (1) notation is bogus, if only for the reason that (1), (2), etc are already in conventional use for numbering paragraphs, along with 1-, 2-, etc:

(1) Paragraph one has some text.

(2) And one more paragraph.

So if some new “numbering syntax” is going to be introduced, I’d rather see (1) etc employed for this purpose.

I can’t even recall the last time I saw a user in the wild try to make a list this way:

(1) one
(2) two
(3) three

Or if I have, it is exceedingly rare on the order of one time every six months.

In contrast, I see this form

1) one
2) two
3) three

almost, literally, every single day. Seriously.

Well, I see numbers in paired parentheses regularly, but like @tin-pot says, that’s usually used to number examples and the like, not true lists. It’s also common to see additional variations instead of strict numbering, e.g.:

(1) An example
(1') A variant of that example
(1*) Another variant

So it may be better to reserve this for a future “labeling” extension.

I said paragraphs: the syntax “(1)⎵” and alternatively “1-⎵” is commonly used to number paragraphs in a section, or a whole document. At least this is common in Germany, and a typical example would be law texts.

Both forms are codified in DIN 1421:1983.

But this applies only at the start of a line: And inside a line, it seems to me that it is common to use for example (a)⎵ and (b)⎵ to indicate where alternative parts of a sentence start (same with decimal numerals).

Conceptually, is there a distinction between a series of numbered paragraphs and a loose ordered list? If you are intending to number paragraphs anyway, wouldn’t it be better to mark up the document with an actual semantic list?

Sure there is. While not if you only think of

  • a “flat” sequence of numbered paragraphs on the one hand, and

  • a “simple”, loose, ordered list, where each item is “like” a paragraph with a prefixed number.

In this case there is basically only a typographic distinction.

But, to start with, paragraphs are usually numbered consecutively across subdivisions of the text (either counting all the document’s paragraphs in one sequence, or counting the paragraphs contained in a specific division level, say “section” as the first division level). This is a distinction on how the numbers are assigned to either “numbered paragraphs” or else “ordered list items”.

But more important is the typographic distinction, which in turn is there because there is a conceptional distinction: List (items) form a hierarchy, paragraph’s don’t. Lists start and end inside (a subdivision of the) text, or even “inside” a paragraph (it is common to think of lists being a subordinate part of a paragraph), paragraphs don’t (they are the text, if you want). List items are referred to like “see item 42 above”, paragraphs are referred to like “see section 5.2 paragraph 42”. And so on.

If you are intending to number paragraphs anyway, wouldn’t it be better to mark up the document with an actual semantic list?

If you mean by “an actual semantic list” the “loose, ordered list” you mentioned, then no, absolutely not: in what sense would it be “better” to misuse a list in that case? As far as I can see, it would not only fail to be “better”, but it would simply be wrong, even from the perspective of the reader (let alone a processing application!) of the resulting document: the “markers” or “labels” or whatever one likes to call it produced in this way would not look like paragraph numbers, but like list item number—because that’s what they are!

As I wrote: paragraph numbers have the form “(1)⎵” (preferred) or “1-⎵” (alternative), and that’s it (as far as I know, and as far as DIN 1421 goes).


But I see that there’s a conflict here between the use of “(1)⎵” and/or “1-⎵” for the purpose to number

  • paragraphs: as specified in DIN 1421, and used in law texts and so on, or

  • list items: as seemingly proposed in the “Chicago Manual of Style”, quoted above, among seven different forms of “ordered list labels” (one for each list hierarchy level!), of which only one, “1.⎵”, is currently expressible in CommonMark , while the second CommonMark form, “1)⎵”, is not even among the forms proposed by the “Chicago Manual of Style”…

Given this, if one wants to add new forms of “ordered list item labels”, maybe we should start with adding “a)⎵” as an “item marker” form? As far as I can tell, “1)⎵” is typically, if not exclusively used for footnote references1) anyway.

And in fact, I do most often see “(1)⎵” used not for list items, but for paragraph numbers.

But all of this might be largely a “anglophone vs continental typographical style” thing, I don’t know …

________

  1. Like this one. You can see footnotes like this in most DIN and ISO standards.

I use the (1) form quite frequently, and it’s in the Chicago Manual of Style:
http://www.chicagomanualofstyle.org/16/ch06/ch06_sec126.html

As I said, there are a bunch of other forms in the Chicago Manual which aren’t supported by CommonMark either.

Chiefly among them the form “a)␣”, which

  • I for one would use frequently if I could,
  • is used frequently by others (alas not the W3C default style sheet for HTML),
  • is sanctioned not only by the Manual, but by proper Standards too,
  • is unambiguously the label of an item in an ordered list.

While “(1)␣” is, as I pointed out, at least “officially” used for paragraph numbering in texts and sanctioned for this use by Standards addressed at some 80 Mio people.

@tin-pot While I’m not entirely convinced of the strength of your conclusion regarding lists vs paragraphs, I think your posts bring to light a valuable point - alternative ordered list labels mentioned in the Chiago Manual of Style ought to be considered as list item types if (a) is to be considered. Problematically, these types are not all represented in HTML. HTML5 only covers five: decimal 1, lower-alpha a, upper-alpha A, lower-roman i, and upper-roman I. Even if CommonMark did support (a) as a list item type, it’s not clear what list style type it ought to be rendered as. And this says nothing about alternatives such as 1.1, which aren’t even supported via counter styles in CSS.

It may be safest to leave list types/markers (outside of the five specified in HTML5) as plain text in CommonMark.

1 Like

@chrisalley I’m not too convinced of the utility of automatic paragraph numbers in CommonMark either (and one could still use the alternative syntax “1-␣” for that); but just wanted to point out that “(1)␣is commonly used to mark other text elements than “ordered list items”, too. More related to style and typography than to the question of CommonMark syntax, if you see it this way.

And then there’s the differences between, say, anglo-american typography and popular writing style, and the customs prevalent in (western-)europe, and of course the “rest of the world”—I have no idea if there are “typical latin-american” typographical guidelines used in South America, for example.


Problematically, these types are not all represented in HTML.

Depending on which HTML definition you look at, there might be no variations in the style of unordered lists represented at all, as this is purely a presentational choice. Let alone when converting CommonMark to non-HTML document types and document formats like LaTeX, OpenDocument, DITA, and so on.

Thus it is clearly “out of scope” for the CommonMark specification to decree which particular “item marker styles” can or can not be represented in the rendered, “final”, output document (from a post included in a web forum, to a printed article or book). As a consequence, what (any version of) HTML provides or can represent should not be the ultimate arbiter on what can be expressed in the CommonMark input syntax, but only some rough guideline.

What the specification could—and in my opinion: also should—require from a CommonMark processor however, is that the “list item marker style” used in each list item is somehow made available in the parser’s output (that is: the “AST”, the “CommonMark XML DTD”, the “CommonMark document content model”).


Specification proposal

One obvious way to represent this information would be a “marker=” attribute1) in the DTD’s <item> element type (replacing or augmenting the current delimiter= attribute), which would store the (trimmed) marker string used in the input text for this particular list item. It wold hold CDATA content like:

  1. marker="-", marker="+", and marker="*" for the unordered list items of now;
  2. marker="42.", marker="07)" etc for the ordered list items of now; and maybe;
  3. marker="a)", marker="ii)" and what not for the ordered list items of the future.

But how exactly these markers are mapped into and represented in the target document (HTML or otherwise), is the job and decision of the particular application, typically in conjunction with some style sheet, and can not be mandated by the CommonMark specification as I see it.

______

  1. The “=” suffix in “name=” is just meant to indicate an attribute name, in contrast to an element type name, which is often written as “<name>”—a convention that I’ve seen used elsewhere and happen to find useful.

By the way: what you referenced as “CSS”, seems to be in the CSS Counter Styles Level 3 Editor’s Draft, which is a “fairly stable” module in the current snapshot “Working Group Note” of CSS 3, but it is not in the—current at this time—CSS 2.1 recommendation nor the lastest CSS 2.2 draft, where the grammar only lists @import, @page, @media, and @charset as at-rules.

So I’d see @counter-style and what it does or does not support as merely a hint at some of the possible future directions that CSS might take: that’s pretty vague, and shouldn’t have any bearing on the CommonMark specification IMO.


And what do you mean by the remark that “alternatives” such as “1.1” wouldn’t be supported (via this @counter-style rule or any other CSS feature)? The “1.1” is a division number of a division at the 2nd level (eg, a “sub-section” in a text): it obviously uses two counters (in the CSS sense), and it does not apply to list items, and you can certainly generate hierarchical division numbers and insert them as generated content into section headings, using only standard CSS 2. (Well, you could hierarchically number list items the same way too, but that would fly in the face of any “style manual”, from Chicago to Zurich …)

According to W3C’s CSS Lists and Counters Module Level 3, a counters() function may be used to combine any counter representation with any level of nesting. In fact, Example 22 there renders as follows:

(1) one
(2) two
   (2.1) nested one
   (2.2) nested two
(3) three

It seems that W3C (or at least one of their editors) thinks (1) could be a legitimate marker.