Nested ordered list numbering

floda · June 20, 2022, 2:04pm

Would someone be so kind as to enlighten us?

This goes back to this issue: Markdown: What's the proper way to do a continued list inside of a list? - Stack Overflow which is outdated now at least. The answer states the problem lies in the p, but there is none right now.

This does not work:

1. One
    1. one
    2. two
2. Two
    3. three
    4. four

It creates this:

<ol>
<li>One
<ol>
<li>one</li>
<li>two</li>
</ol>
</li>
<li>Two
3. three
4. four</li>
</ol>

This does, but causes newlines where none belong:

1. One

    1. one
    2. two
2. Two

    3. three
    4. four

This however works fine, so it’s in the starting number:

1. One
    1. one
    2. two
2. Two
    1. three
    4. four

The spec is unfortunately not helping figure this out.

Thank you!

vas · June 21, 2022, 7:38am

From the spec’s rules for list items:

When the first list item in a list interrupts a paragraph—that is, when it starts on a line that would otherwise count as paragraph continuation text—then (a) the lines Ls must not begin with a blank line, and (b) if the list item is ordered, the start number must be 1.

In your examples, the first element of the first list item is the paragraph “One”. Your first example does not work because the sublist interrupts this paragraph, while your second example does work because it has a blank line between the paragraph and the sublist.

There are no newlines being created. The visual difference is a loose list versus a tight list. A tight list by definition involves paragraph interruption. But because of the above rule, you cannot interrupt a paragraph with a list item numbered other than 1.

To understand the reasons for this, read the spec section for Lists, which discusses loose vs tight lists as well as why “we allow only lists starting with 1 to interrupt paragraphs.”

This issue has been extensively discussed in this forum:

Blank lines before lists, revisited
original spec discussion: Requiring a blank line between a paragraph and list
most recent revisit: Ordered lists and sublist starts

floda · June 24, 2022, 8:11am

Thank you very much for your explanation! I appreciate your effort. Searching got me nowhere, so please excuse the repeated question.

To be honest, I find the spec harder to think through than complex code. It’s difficult to ‘parse’ for human readers . The interactive tutorial is nice, but it kinda misses all the bits that you stumble over in real life.
It’s getting especially confusing when you use multiple different applications, as everything parses just differently enough to wreak havoc. commonmark seems to be decently thought through though, so I’ll just stick to that where possible. It’s funny how a simple markup idea supposed to make things easier had me invest many hours just to keep my notes from exploding.

I suppose this mouthful is the cause:

In order to solve of unwanted lists in paragraphs with hard-wrapped numerals, we allow only lists starting with 1 to interrupt paragraphs.

vas · June 24, 2022, 10:22pm

That’s because there’s a typo. It was reported and fixed just yesterday. But even with the typo in that paragraph, if you read the entire section on Lists (which I linked in my previous reply), focusing especially on the examples, are you still confused?

That’s pretty much the impetus for creating CommonMark in the first place. You will avoid such issues if you stick to CommonMark compliant tools, ones that promise 100% spec compliance, and when they diverge, you should file a bug and/or switch tools.

Whenever you have doubts about whether you or the tool is wrong about the syntax, I recommend use BabelMark 2 or BabelMark 3 to see what well known parsers have to say. For example, the results for your list use case has:

a group of tools that agree with your expectations that 3. three is a list item, but don’t agree that it continues the earlier sublist, numbering it 1 instead of 3. This group includes the original Markdown.pl, others that were dominant before the rise of CommonMark and GFM, and also Pandoc(strict).
a group of tools that are CommonMark compliant, that don’t see 3. three as a list item.
very small group that agrees with your expectations, a list item that starts with 3. This includes Pandoc (non-strict).

If what you mean is that different tools follow different rules, yes, that has costs. But if what you mean is that the idea behind Markdown hasn’t made things easier, I don’t agree. You have to think about what “simple” means. If it means, “The markup must be simple, but must also be able to correctly interpret ambiguity-laden corner cases”, then that is impossible without artificial intelligence. Looking at the examples in the List section of the spec, how is a dumb machine supposed to know that 14. The number of doors is 6. is not a list item?

But if what you mean by “simple” is:

A syntax which the user can grasp after reading a short description, and which the vast majority of users can use without issue 99 or even 99.9% of the time, without ever having to read much less internalize all the possibly esoteric rules the syntax may incorporate, especially when those esoteric rules are designed precisely so that the vast majority of users can it use without issue 99 or even 99.9% of the time.

…then I think that Markdown has been very successful, and the proof is in the pudding.

It’s important to realize that your list use case is very very rare. I doubt even one out of a thousand users have sublists that continue across parent list boundaries. For users with different needs, that want a syntax that has the precise control of code rather than natural language, there are syntaxes for that, e.g. HTML or Asciidoc.

*(btw, the reason I take the time to write extensive replies like this on this forum is because I’m working on a proposed evolutionary step for structured plain text. Hearing people’s issues and writing replies helps me think and keeps me grounded

floda · June 26, 2022, 6:38pm

Oh no, I get it. It’s just a difficult sentence for describing something simple as lists. Hundreds of pages for describing what we’ve all been doing in .txt forever. Parsing markdown seems to be one of the trickier syntaxes, with so many edge cases.

That’s not really an option; while there are a ton of options out there, few are good. I even made my own using a commonmark compliant parser, because I wanted something permanent to last until we die. Spent way too much time with different ‘knowledge tools’ before I ended up with plain md.

Fully agree. Big no to any ‘AI’ (which is only a buzzword anyway) spyware. Though, before markdown took over we had bbcode, wikitext, and similar. It wasn’t as simple to type, but you never thought about it. When you’re in the middle of writing, you won’t take a few hours to understand exactly why some syntax doesn’t do what you expect. If you add tables, you even violate commonmark. There are intricacies in the table implementations too. (E.g. the number of required dashes in the separator is inconsistent, or requiring the outer pipes.) Boom, you’re thinking about syntax instead of your content. It’s kinda hard to research syntax issues due to wording, too.

True. I was mostly creating examples to test my own tool and randomly noticed this. Most of the time, some light inline formatting plus a few headers are enough.

Asciidoc seemed interesting, but I’m already way too invested in md to change. There’s ‘rest’ too, but I’ve rarely encountered that in the wild. Pandoc sets sort of a big standard in all it does. Or for masochists, latex (which has tons of inconsistent implementations). Or what about troff? Different usecases for sure, but personally, I want to unify and focus on the content. I decided on embedding various objects as images (and save the sourcefile), as it’s not reasonable to expect implementing i.e. mermaid and maintaining it (much is JS-only and changes constantly).

That’s wonderful, I love your passion. We see little of that in tech now. Most stuff comes and goes. Code I wrote last year is now deprecated because the deps disappeared… For real permanence + parseable semantics probably (original, boring, non HTML5) XML would be best. Unambiguous, but fat and anti-human.

If you’re interested in feedback: my #1 concern for choosing parsers was permanence. Md written today should render just fine forever. We both know nobody is going to go back over 1000s of pages to fix them later. The upside is that md will always be readable text-only. It’s good to see how much thought you guys put into the spec.

I wish you a great week.