Letter-ordered lists

I was going to submit this as an issue before finding this page, but below are my thoughts on implementing this spec:

CommonMark ordered lists are pretty limited in their scope, however CSS allows us to render them in a variety of ways (upper/lowercase alpha-numeric, Roman|Georgian|Hebrew|Armenian|etc numerals) I would like to see an implementation spec regarding the following use cases.

I tried to demo existing spec features, such as distinguished list groups based on delimiter change [Lists: example 186] and nesting list items [Lists: example 199]:

I based the delimiters on the type attribute values outlined in the HTML5 <ol> tag spec since this format is consistent with current practices and is fairly declarative, while still being easy to read and author.

Caveat:

This deliberately limited subset potentially alleviates the issue of names beginning with an initial from initializing an ordered list.

// Lettered: simple example
a. item                                   >>>     a. item
a. item                                   >>>     b. item
    a. item sub 1                         >>>         a. item sub 1
a. item                                   >>>     c. item
A. item                                   >>>     A. item
A. item                                   >>>     B. item
A. item                                   >>>     C. item

// Roman numeral: simple example
i. item                                   >>>       i. item
i. item                                   >>>      ii. item
i. item                                   >>>     iii. item
I. item                                   >>>       I. item
I. item                                   >>>      II. item

// More contrived example
a. item                                   >>>     a. item
a. item                                   >>>     b. item
    1. item sub 1                         >>>         1. item sub 1
        i. item sub 2                     >>>             i. item sub 2
        i. item sub 2                     >>>            ii. item sub 2
    1. item sub 1                         >>>         2. item sub 1
        A. item sub 2                     >>>             A. item sub 2
            - item sub 3                  >>>                 - item sub 3
            - item sub 3                  >>>                 - item sub 3
        A. item sub 2                     >>>             B. item sub 2
        A. item sub 2                     >>>             C. item sub 2
a. item                                   >>>     c. item

Markdown renders:

<!-- Lettered: simple render -->
<ol type="a">
  <li>item</li>                           >>>     a. item
  <li>item</li>                           >>>     b. item
  <li>                                    >>>         a. item sub 1
    <ol type="a">                         >>>     c. item
      <li>item sub 1</li>                       
    </ol>                                 
  </li>
  <li>item</li>                           
</ol>
<ol type="A">
  <li>item</li>                           >>>     A. item
  <li>item</li>                           >>>     B. item
  <li>item</li>                           >>>     C. item
</ol>

<!-- Roman numeral: simple render -->
<ol type="i">
  <li>item</li>                           >>>     i. item
  <li>item</li>                           >>>    ii. item
  <li>item</li>                           >>>   iii. item
</ol>                                     >>>     I. item
<ol type="I">                             >>>    II. item
  <li>item</li>                           >>>   III. item
  <li>item</li>                           
  <li>item</li>                           
</ol>

<!-- AND our really contrived example -->
<ol type="a">
  <li>item</li>                           >>>     a. item
  <li>item</li>                           >>>     b. item
  <li>                                    >>>         1. item sub 1
    <ol type="1">                         >>>             i. item sub 2
      <li>item sub 1</li>                 >>>            ii. item sub 2
      <li>                                >>>         2. item sub 1
        <ol type="i">                     >>>             A. item sub 2
          <li>item sub 2</li>             >>>                 - item sub 3
          <li>item sub 2</li>             >>>                 - item sub 3
        </ol>                             >>>             B. item sub 2
      </li>                               >>>             C. item sub 2
      <li>item sub 1</li>                 >>>     c. item
      <li>
        <ol type="A">
          <li>item sub 2</li>         
          <li>
            <ul>
              <li>item sub 3</li>    
              <li>item sub 3</li>      
            </ul>
          </li>
          <li>item sub 2</li>
          <li>item sub 2</li>
        </ol>
      </li>
    </ol>
  </li>
  <li>item</li>
</ol>

Presentation Styling

The developer or designer could utilize CSS to adjust presentational properties of the lists in a more targeted way:

ol { /*...*/ }
ol[type="1"] { /*...*/ }
ol[type="a"] { /*...*/ }
ol[type="A"] { /*...*/ }
ol[type="i"] { /*...*/ }
ol[type="I"] { /*...*/ }

/* potential real-world presentational language overrides */
html[lang="he"] ol[type="1"] { list-style-type: hebrew; }
html[lang="hy"] ol[type="1"] { list-style-type: armenian; }
html[lang="ka"] ol[type="1"] { list-style-type: georgian; }
...

Real-world use cases:

  • product details
  • feature lists
  • change logs
  • procedural instructions: assembly/work flow
  • legal documentation

With the roman numeral example, how is i. set apart from the letter i.? Is it when the list marker changes from a. to i.? What happens when the previous marker is h.?

For capital letters, we still have the problem of A. being used to represent an initial, e.g. A. Hitchcock.

The use of the HTML type attribute looks good.

2 Likes

Please forgive digging up an old thread.

Is this getting moved to Extensions, or does it belong in Spec? That decision still seems to remain undecided.

What actually should be part of the core, is a more generic description of line prefixes, so that extensions could give them meaning. I’m thinking of something like this (with nesting and indentation mostly ignored):

line   := ( content | (prefix space+ content (space+ suffix?)?) | fence ) newline;
prefix := (opener? attribute? closer) | (attribute? leader)+;
suffix := (opener closer?) | leader+;
opener := '(' | '[' | '{' | '<';
closer := ')' | ']' | '}' | '>';
leader := '!' | '"' | '#' | '$' | '%' | '&' | '\'' | '+' | ',' | '-' | '.' | 
        | '/' | ':' | ';' | '?' | '=' | '@' | '\\' | '^' | '_' | '`' | '|' | '~';
fence  := space* leader+ space*;
1 Like

All that would be needed is

  1. A command to change the CSS for a normal list. CSS and HTML already support lettered lists.
1 Like

I’ve moved this discussion from “Spec” to “Extensions” since letter-ordered lists are not on the roadmap for CommonMark 1.0.

1 Like

I prefer a separation of semantics from presentation, similar to HTML where the symbols used to represent the ordering are a rendering concern defined not in the HTML but in CSS (CSS list-style-type Property So many options! traditional Katakana iroha numbering anyone?).

Sorry, I must have missed your post earlier @vas. What are your thoughts on using the type attribute (examples earlier on the topic) which is part of HTML? The writer might want to preserve the type of ordered list so that this information is transferable even when the stylesheets are different, e.g. to ensure that a sub list remains letter ordered so that the items can be referred to elsewhere in the text as item a, b, etc. Otherwise these items might vary depending on the style sheet used, changing the meaning on the document.

But since the numbers given in Markdown for ordered lists are not literal, the numbering in the rendered output may be different. But your use case, @chrisalley, is an important one. How to refer to items in an ordered list from elsewhere in the same content? The proper solution is to provide a reference syntax, offhand something like (but better than):

1. Item A
3. Item C (Item B got deleted)
4. Item D

Here is a paragraph that refers to Option {{#3}}. 

Which would render as:

1. Item A
2. Item C (Item B got deleted)
3. Item D

Here is a paragraph that refers to Option 2.

EDIT: I just realized my idea might not work, since CSS styling would be applied after the Markdown was rendered to HTML, meaning the reference could not reflect the style (number, letter, roman numeral)… Or can it? Is there an HTML/CSS trick to support this?

My original point is just a preference. If it turns out that the pros of supporting specified list styles in Markdown outweigh the pros of separation of concerns, I can get behind it. But it does seem to go against the design principles of Markdown and its most common output format, HTML. Yes, HTML supports inline declaration of styling, but that’s because HTML supports both SOP and monolithic approaches. Markdown currently doesn’t support inline styling in any way other than embedding HTML.

I think I’ve rambled on too long about something that may not be that important!

This is a good point, the list markers might not match the rendered output. I’ve been assuming that authors would update their lists to the actual letter used, e.g.

a. First item
b. Second item
c. Third item

instead of the lazy style:

a. First item
a. Second item
a. Third item

It would only be in the first case that authors could accurately refer the starting letter. However, I also think that at least in smaller documents, the author would clean up their lists to use the literal letter marker, and manually change the letter to refer to the correct list item in other parts of the text. This is, after all, what you would do in a plain text file, which is what Markdown is in it’s raw form. The advantage of doing it this way, rather than requiring a special reference syntax, is that authors can create and refer to letter ordered list without any special knowledge besides how to write a plain text document; this makes the syntax ideal for casual forum posts, etc.

Perhaps some kind of reference syntax (similar to how links and footnotes work) should be added for more complex scenarios.

1 Like

I would argue that a good reason for inclusion of letter-ordered or a roman numeral-ordered lists in commonmark is that people commonly write these in regular text files

As a markdown user, when I’m writing, I’m writing a text file which also happens to magically compile to give a rendered output.

Having numeral lists be rendered but not letter / roman-numeral lists causes an inconsistency in rendering, which forces me to mentally change my writing style to just use numeral lists. I don’t want this mental overhead of remembering that I must use only certain kinds of lists because only certain kinds work with markdown.

Since both numeral and letter lists work in the text, I expect them both to behave consistently after rendering as well.

Common use-cases:

1. Item
  a. Sub-item
  b. Sub-item

**Definition**: Mathematical definition, and conditions (i) - (v) hold:
 (i) First condition
 (ii) Second condition
 (iii) Third condition
 (iv) Fourth condition
 (v) Fifth condition

I'm taking notes on something, the reference text using a) b) c)
1. But
2. My List
3. Uses 1, 2, 3,

So now whenever they refer to c, I have to remember to translate it to 3. Which sucks.
3 Likes

Bump. (Is bumping allowed?) Or, I second this. Whichever.

I found another use case, which is that a lot of software licenses contain letter-ordered or roman numeral-ordered lists. They don’t convert naturally to Markdown without altering the license text (bad) or cluttering it up with HTML code.

I checked several popular licenses and many have lettered lists: the GPL 3.0 (section 5), Creative Commons 4.0 (section 1), Apache 2.0 (section 4), etc.

I found this while trying to Markdownify the Perl Artistic 2.0 license, which has both lettered AND roman-numeral sublists - https://github.com/greg-kennedy/dot_scr/blob/master/LICENSE.md

Since GitHub is the top location for open-source software, and it almost encourages that every document be posted in Markdown (e.g. README.md is automatically displayed on the repository home), this seems like a really big omission.

I’m in agreement. I argued for lettered lists in our original discussions about the spec, but for some reason there was resistance. We’ve had them in pandoc for a decade at least, and they are pretty unproblematic. (We even have roman-numbered lists.)

One tricky thing is that names with initials can sometimes look like a letter-ordered list: e.g.

B. Russell says...

In pandoc we avoid problems of this kind by requiring two spaces after letter-ordered lists starting with a capital letter.

1 Like

Initial letters with an abbreviation dot are similar to numbers with an ordinal dot (some languages use 1., 2., 3., 4. where English has 1st, 2nd, 3rd, 4th, e. g. in dates). The solution to distinguish them from list markers at the start of a line should be similar as well, especially for lists that interrupt a paragraph without a preceding blank line and for single-item lists, which together constitute an issue that blocks v1.0.

Since i. and i) would be even more ambiguous if Roman numerals were to be supported as ordinal list values, one could amend and restrict the valid syntax choices, e. g.:

Type Tight Loose
numeric decimal, single-digit 1., 1), (1.), (1), #1 1., 1), (1.), (1), #1
numeric decimal, multiple-digits (12), #12 12., 12), (12.), (12), #12
lowercase Roman numeral (i.), - i., * i., + i. i., (i.), - i., * i., + i., - i, * i, + i
uppercase Roman numeral (I.) I., (I.)
lowercase Latin, single-letter a), (a) a), (a)
uppercase Latin, single-letter A), (A) A), (A)
lowercase Latin, multiple-letters ab), (ab)
uppercase Latin, multiple-letters AB), (AB)
lowercase Greek, single-letter α., α), (α) α., α), (α), - α, * α, + α
1 Like

I frequently try to create nested lists on Discourse and am frustrated by the inability to do so using letters and roman numerals, even in a conservative lowercase-only subset.

My minimum viable feature set would be simply a conservative parsing of
a) / a.
b) / b.
c) / c.

and
i) / i.
ii) / ii.
iii) / iii.

This might be made easier by requiring indentation, as with the current list formats.

A more feature complete implementation would resemble @crissov’s list above,

It’s not entirely trivial to implement, because in addition to @Chuck_Roberts’ suggestion that

there would, of course, also need to be several parsing rules added to translate the different types of lettered and roman numeral lists in markdown to the equivalent in HTML/CSS.

That said, I, for one, welcome the trivial implementation of this into CommonMark, and eventually maybe the extended implementation.

I know that this is an old request that had no good answer, but I recently implemented a hack that might work as a syntax for this. Trying to parse the many different formats of list labels in human text is clearly insane, so the logical conclusion is to add some syntax for that which looks readable enough and requires minimal additions. I ended up using

* [a] blah blah blah

with a simple regexp post-processing implementation that looks for <li>[something] and changes that. It could conflict with an anchor reference, but (a) that would be rare given that anchor names are usually longer than list labels, (b) with my hack, an existing anchor would not be modified. (A proper implementation can barf instead.)

Since this has been resurrected and no one has mentioned it yet, reStructuredText supports alphabetic as well as roman numeral lists:

Enumerated lists (a.k.a. “ordered” lists) are similar to bullet lists, but use enumerators instead of bullets. An enumerator consists of an enumeration sequence member and formatting, followed by whitespace. The following enumeration sequences are recognized:

  • arabic numerals: 1, 2, 3, … (no upper limit).
  • uppercase alphabet characters: A, B, C, …, Z.
  • lower-case alphabet characters: a, b, c, …, z.
  • uppercase Roman numerals: I, II, III, IV, …, MMMMCMXCIX (4999).
  • lowercase Roman numerals: i, ii, iii, iv, …, mmmmcmxcix (4999).

In addition, the auto-enumerator, “#”, may be used to automatically enumerate a list. Auto-enumerated lists may begin with explicit enumeration, which sets the sequence. Fully auto-enumerated lists use arabic numerals and begin with 1. (Auto-enumerated lists are new in Docutils 0.3.8.)

The following formatting types are recognized:

  • suffixed with a period: “1.”, “A.”, “a.”, “I.”, “i.”.
  • surrounded by parentheses: “(1)”, “(A)”, “(a)”, “(I)”, “(i)”.
  • suffixed with a right-parenthesis: “1)”, “A)”, “a)”, “I)”, “i)”.
1 Like

I find the list formatting where you have the ability to easily reference subheadings really useful in long lists. So you’d have top levels that are numbered followed by subheadings that use letters, roman numerals, etc. This is a pretty standard format in many programs for creating documents, such as Google Docs -

Here is their options for standard ordered lists:
Screen Shot of Google Doc's ordered list options - it shows six different versions of formatting for different indentation levels using combinations of numbers, upper or lowercase letters, roman numerals, and decimal numbers

In MD, my only option is to mix numbers with bullets. It’d be nice to have a way to allow other options.

On our platform we’re working on mixing rich text input with MD - we render everything into MD but people can copy and paste rich text into an editor in rich text mode and we’ll convert what we can. It’d be cool if the spec allowed for more options because it’d allow us to retain more of the formatting we receive from a rich text document.

1 Like

I think the general principle of GFM task lists could be employed for lists with explicit markers:

- [(f)] letter
- [iv.] roman, same list
* [§2] paragraph or section sign, new list
1 Like

(@vas: I think that anyone involved with the design knows rst well, @jgm definitely knows it – see pandoc.)

@Crissov: That’s exactly what I’m suggesting, but I missed the fact that it’s similar to the GH syntax for checklists… On the positive side, it shows that using this syntax won’t lead to problems, but the minor negative is that it would be a problem for existing uses of - [x] as a checked item rather than an x label.