Letter-ordered lists

I’ve found letter-ordered lists to be a common requirement, frequently used whenever there is a sub-list. An extension would be very useful.

@codinghorror Could topics such as this one be recategorised under “Extensions”?

3 Likes

Seeing that they are a common requirement, I would propose that they be part of the spec rather than in an extension. (But just a proposition :^)

5 Likes

My only concern is whether it would break existing Markdown implementations. For example, you might have a sentence beginning with a. or (a) but not intend for that to start an ordered list. If it’s an extension, at least you’d need to turn on the extension before breaking anything.

+++ chrisalley [Sep 24 14 04:41 ]:

My only concern is whether it would break existing Markdown implementations. For example, you might have a sentence beginning with a. or (a) but not intend for that to start an ordered list. If it’s an extension, at least you’d need to turn on the extension before breaking anything.

I am, myself, a fan of letter-ordered lists, and they are supported
in pandoc.

It may occasionally happen that a. or (a) or a) will occur by
accident at the beginning of a hard-wrapped line, but this is not going
to be any more likely than 1. or 1) appearing.

However, if capital-letter lists are allowed, there is a significant
risk that names with initials will be wrongly interpreted as lists:
B. Russell says.... The solution in pandoc is to require two spaces
after the period in these cases.

1 Like

Good point. Are capital-letter lists common? Lowercase letters appear to be common in Terms & Conditions. If capital-letter lists are uncommon, perhaps CommonMark could only support lowercase lists. The web developer could always use CSS to make the list capitalised if neccessary.

I have a concern with the double space rule after the B.. Writers who are “just muddling through” and haven’t read the documentation might be confused when the B. turns into a list item. Without searching the documentation on how to apply the override, they might give up.

Lower-case only sounds like a reasonable solution for the main spec.

I do think this is a common need, personally, as this would come up for example in nearly any copyright license, which is a common case where people will want something edited in plain text but nonetheless very readable (after all, it’s for the sake of text README and related files that Github supports Markdown).

4 Likes

I was going to submit this as an issue before finding this page, but below are my thoughts on implementing this spec:

CommonMark ordered lists are pretty limited in their scope, however CSS allows us to render them in a variety of ways (upper/lowercase alpha-numeric, Roman|Georgian|Hebrew|Armenian|etc numerals) I would like to see an implementation spec regarding the following use cases.

I tried to demo existing spec features, such as distinguished list groups based on delimiter change [Lists: example 186] and nesting list items [Lists: example 199]:

I based the delimiters on the type attribute values outlined in the HTML5 <ol> tag spec since this format is consistent with current practices and is fairly declarative, while still being easy to read and author.

Caveat:

This deliberately limited subset potentially alleviates the issue of names beginning with an initial from initializing an ordered list.

// Lettered: simple example
a. item                                   >>>     a. item
a. item                                   >>>     b. item
    a. item sub 1                         >>>         a. item sub 1
a. item                                   >>>     c. item
A. item                                   >>>     A. item
A. item                                   >>>     B. item
A. item                                   >>>     C. item

// Roman numeral: simple example
i. item                                   >>>       i. item
i. item                                   >>>      ii. item
i. item                                   >>>     iii. item
I. item                                   >>>       I. item
I. item                                   >>>      II. item

// More contrived example
a. item                                   >>>     a. item
a. item                                   >>>     b. item
    1. item sub 1                         >>>         1. item sub 1
        i. item sub 2                     >>>             i. item sub 2
        i. item sub 2                     >>>            ii. item sub 2
    1. item sub 1                         >>>         2. item sub 1
        A. item sub 2                     >>>             A. item sub 2
            - item sub 3                  >>>                 - item sub 3
            - item sub 3                  >>>                 - item sub 3
        A. item sub 2                     >>>             B. item sub 2
        A. item sub 2                     >>>             C. item sub 2
a. item                                   >>>     c. item

Markdown renders:

<!-- Lettered: simple render -->
<ol type="a">
  <li>item</li>                           >>>     a. item
  <li>item</li>                           >>>     b. item
  <li>                                    >>>         a. item sub 1
    <ol type="a">                         >>>     c. item
      <li>item sub 1</li>                       
    </ol>                                 
  </li>
  <li>item</li>                           
</ol>
<ol type="A">
  <li>item</li>                           >>>     A. item
  <li>item</li>                           >>>     B. item
  <li>item</li>                           >>>     C. item
</ol>

<!-- Roman numeral: simple render -->
<ol type="i">
  <li>item</li>                           >>>     i. item
  <li>item</li>                           >>>    ii. item
  <li>item</li>                           >>>   iii. item
</ol>                                     >>>     I. item
<ol type="I">                             >>>    II. item
  <li>item</li>                           >>>   III. item
  <li>item</li>                           
  <li>item</li>                           
</ol>

<!-- AND our really contrived example -->
<ol type="a">
  <li>item</li>                           >>>     a. item
  <li>item</li>                           >>>     b. item
  <li>                                    >>>         1. item sub 1
    <ol type="1">                         >>>             i. item sub 2
      <li>item sub 1</li>                 >>>            ii. item sub 2
      <li>                                >>>         2. item sub 1
        <ol type="i">                     >>>             A. item sub 2
          <li>item sub 2</li>             >>>                 - item sub 3
          <li>item sub 2</li>             >>>                 - item sub 3
        </ol>                             >>>             B. item sub 2
      </li>                               >>>             C. item sub 2
      <li>item sub 1</li>                 >>>     c. item
      <li>
        <ol type="A">
          <li>item sub 2</li>         
          <li>
            <ul>
              <li>item sub 3</li>    
              <li>item sub 3</li>      
            </ul>
          </li>
          <li>item sub 2</li>
          <li>item sub 2</li>
        </ol>
      </li>
    </ol>
  </li>
  <li>item</li>
</ol>

Presentation Styling

The developer or designer could utilize CSS to adjust presentational properties of the lists in a more targeted way:

ol { /*...*/ }
ol[type="1"] { /*...*/ }
ol[type="a"] { /*...*/ }
ol[type="A"] { /*...*/ }
ol[type="i"] { /*...*/ }
ol[type="I"] { /*...*/ }

/* potential real-world presentational language overrides */
html[lang="he"] ol[type="1"] { list-style-type: hebrew; }
html[lang="hy"] ol[type="1"] { list-style-type: armenian; }
html[lang="ka"] ol[type="1"] { list-style-type: georgian; }
...

Real-world use cases:

  • product details
  • feature lists
  • change logs
  • procedural instructions: assembly/work flow
  • legal documentation

With the roman numeral example, how is i. set apart from the letter i.? Is it when the list marker changes from a. to i.? What happens when the previous marker is h.?

For capital letters, we still have the problem of A. being used to represent an initial, e.g. A. Hitchcock.

The use of the HTML type attribute looks good.

2 Likes

Please forgive digging up an old thread.

Is this getting moved to Extensions, or does it belong in Spec? That decision still seems to remain undecided.

What actually should be part of the core, is a more generic description of line prefixes, so that extensions could give them meaning. I’m thinking of something like this (with nesting and indentation mostly ignored):

line   := ( content | (prefix space+ content (space+ suffix?)?) | fence ) newline;
prefix := (opener? attribute? closer) | (attribute? leader)+;
suffix := (opener closer?) | leader+;
opener := '(' | '[' | '{' | '<';
closer := ')' | ']' | '}' | '>';
leader := '!' | '"' | '#' | '$' | '%' | '&' | '\'' | '+' | ',' | '-' | '.' | 
        | '/' | ':' | ';' | '?' | '=' | '@' | '\\' | '^' | '_' | '`' | '|' | '~';
fence  := space* leader+ space*;
1 Like

All that would be needed is

  1. A command to change the CSS for a normal list. CSS and HTML already support lettered lists.
1 Like

I’ve moved this discussion from “Spec” to “Extensions” since letter-ordered lists are not on the roadmap for CommonMark 1.0.

1 Like

I prefer a separation of semantics from presentation, similar to HTML where the symbols used to represent the ordering are a rendering concern defined not in the HTML but in CSS (CSS list-style-type Property So many options! traditional Katakana iroha numbering anyone?).

Sorry, I must have missed your post earlier @vas. What are your thoughts on using the type attribute (examples earlier on the topic) which is part of HTML? The writer might want to preserve the type of ordered list so that this information is transferable even when the stylesheets are different, e.g. to ensure that a sub list remains letter ordered so that the items can be referred to elsewhere in the text as item a, b, etc. Otherwise these items might vary depending on the style sheet used, changing the meaning on the document.

But since the numbers given in Markdown for ordered lists are not literal, the numbering in the rendered output may be different. But your use case, @chrisalley, is an important one. How to refer to items in an ordered list from elsewhere in the same content? The proper solution is to provide a reference syntax, offhand something like (but better than):

1. Item A
3. Item C (Item B got deleted)
4. Item D

Here is a paragraph that refers to Option {{#3}}. 

Which would render as:

1. Item A
2. Item C (Item B got deleted)
3. Item D

Here is a paragraph that refers to Option 2.

EDIT: I just realized my idea might not work, since CSS styling would be applied after the Markdown was rendered to HTML, meaning the reference could not reflect the style (number, letter, roman numeral)… Or can it? Is there an HTML/CSS trick to support this?

My original point is just a preference. If it turns out that the pros of supporting specified list styles in Markdown outweigh the pros of separation of concerns, I can get behind it. But it does seem to go against the design principles of Markdown and its most common output format, HTML. Yes, HTML supports inline declaration of styling, but that’s because HTML supports both SOP and monolithic approaches. Markdown currently doesn’t support inline styling in any way other than embedding HTML.

I think I’ve rambled on too long about something that may not be that important!

This is a good point, the list markers might not match the rendered output. I’ve been assuming that authors would update their lists to the actual letter used, e.g.

a. First item
b. Second item
c. Third item

instead of the lazy style:

a. First item
a. Second item
a. Third item

It would only be in the first case that authors could accurately refer the starting letter. However, I also think that at least in smaller documents, the author would clean up their lists to use the literal letter marker, and manually change the letter to refer to the correct list item in other parts of the text. This is, after all, what you would do in a plain text file, which is what Markdown is in it’s raw form. The advantage of doing it this way, rather than requiring a special reference syntax, is that authors can create and refer to letter ordered list without any special knowledge besides how to write a plain text document; this makes the syntax ideal for casual forum posts, etc.

Perhaps some kind of reference syntax (similar to how links and footnotes work) should be added for more complex scenarios.

1 Like

I would argue that a good reason for inclusion of letter-ordered or a roman numeral-ordered lists in commonmark is that people commonly write these in regular text files

As a markdown user, when I’m writing, I’m writing a text file which also happens to magically compile to give a rendered output.

Having numeral lists be rendered but not letter / roman-numeral lists causes an inconsistency in rendering, which forces me to mentally change my writing style to just use numeral lists. I don’t want this mental overhead of remembering that I must use only certain kinds of lists because only certain kinds work with markdown.

Since both numeral and letter lists work in the text, I expect them both to behave consistently after rendering as well.

Common use-cases:

1. Item
  a. Sub-item
  b. Sub-item

**Definition**: Mathematical definition, and conditions (i) - (v) hold:
 (i) First condition
 (ii) Second condition
 (iii) Third condition
 (iv) Fourth condition
 (v) Fifth condition

I'm taking notes on something, the reference text using a) b) c)
1. But
2. My List
3. Uses 1, 2, 3,

So now whenever they refer to c, I have to remember to translate it to 3. Which sucks.
3 Likes

Bump. (Is bumping allowed?) Or, I second this. Whichever.

I found another use case, which is that a lot of software licenses contain letter-ordered or roman numeral-ordered lists. They don’t convert naturally to Markdown without altering the license text (bad) or cluttering it up with HTML code.

I checked several popular licenses and many have lettered lists: the GPL 3.0 (section 5), Creative Commons 4.0 (section 1), Apache 2.0 (section 4), etc.

I found this while trying to Markdownify the Perl Artistic 2.0 license, which has both lettered AND roman-numeral sublists - https://github.com/greg-kennedy/dot_scr/blob/master/LICENSE.md

Since GitHub is the top location for open-source software, and it almost encourages that every document be posted in Markdown (e.g. README.md is automatically displayed on the repository home), this seems like a really big omission.

I’m in agreement. I argued for lettered lists in our original discussions about the spec, but for some reason there was resistance. We’ve had them in pandoc for a decade at least, and they are pretty unproblematic. (We even have roman-numbered lists.)

One tricky thing is that names with initials can sometimes look like a letter-ordered list: e.g.

B. Russell says...

In pandoc we avoid problems of this kind by requiring two spaces after letter-ordered lists starting with a capital letter.

Initial letters with an abbreviation dot are similar to numbers with an ordinal dot (some languages use 1., 2., 3., 4. where English has 1st, 2nd, 3rd, 4th, e. g. in dates). The solution to distinguish them from list markers at the start of a line should be similar as well, especially for lists that interrupt a paragraph without a preceding blank line and for single-item lists, which together constitute an issue that blocks v1.0.

Since i. and i) would be even more ambiguous if Roman numerals were to be supported as ordinal list values, one could amend and restrict the valid syntax choices, e. g.:

Type Tight Loose
numeric decimal, single-digit 1., 1), (1.), (1), #1 1., 1), (1.), (1), #1
numeric decimal, multiple-digits (12), #12 12., 12), (12.), (12), #12
lowercase Roman numeral (i.), - i., * i., + i. i., (i.), - i., * i., + i., - i, * i, + i
uppercase Roman numeral (I.) I., (I.)
lowercase Latin, single-letter a), (a) a), (a)
uppercase Latin, single-letter A), (A) A), (A)
lowercase Latin, multiple-letters ab), (ab)
uppercase Latin, multiple-letters AB), (AB)
lowercase Greek, single-letter α., α), (α) α., α), (α), - α, * α, + α