Blank lines before lists, revisited

Currently CommonMark does not require a blank line between a regular paragraph and a list. In this respect it differs from most current implementations.

John Gruber made his intentions clear on this, too; although the syntax description says nothing about it, the test suite includes the following:

In Markdown 1.0.0 and earlier. Version
8. This line turns into a list item.
Because a hard-wrapped line in the
middle of a paragraph looked like a
list item.

Here's one with a bullet.
* criminey.

In the spec I give an extended argument that no blank line should be required. The argument is that a block of text should not change its meaning when placed in a list item. Since all Markdown implementations allow sublists to start without a preceding blank line, the 1. in the following example creates an ordered list:

  - Foo
    1. bar

So if we take the contents of this list item out of the list item,

Foo
1. bar

the 1. should still create an ordered list. This is the Principle of Uniformity. The spec for list items presupposes it, because it has the form: “suppose lines Ls make up blocks Bs; then if you put Ls into a list item, adding a list marker to the first line and indenting the rest appropriately, you get a list item with contents Bs.” That kind of spec won’t work if Ls has a different meaning inside a list item than outside.

I point out that reStructuredText requires a blank line before all lists, even when they occur as sublists – a different choice which also respects the Principle of Uniformity. But given that Markdown definitely does not require blank lines before sublists, Uniformity says it should not require blank lines before lists either.

I still think that’s a pretty compelling argument. But I’m having second thoughts, after writing a short memorial notice with lots of dates and constantly running up against this problem. It is awkward to have to escape periods after numerals that occur at the beginnings of lines. And if you use software to automatically hard-wrap your lines (fmt or par, for example), the problem is even worse. I figure John Gruber ran up against the same problem, which caused him to add the test case quoted above. Several people have brought up the issue on this forum, as well.

In addition, going a different way than the majority of current implementations has a steep cost; it means that some existing documents will render in ways that aren’t intended if interpreted as CommonMark.

All this is making me think that perhaps we should find a way to write the spec that requires blank lines between paragraphs and lists, except when the paragraphs are (direct?) children of a list item. Perhaps this can be done without entirely changing the form of the spec, by adding an additional condition – that a list cannot interrupt a paragraph unless that paragraph is the child of a list item, or something to that effect?

Does anyone have thoughts on this?

1 Like

Can you elaborate on this with some examples?

I think the most common (and worthwhile) use case that the current spec does support is

  • a single introduction line, often ending with a colon (or not with a period; at least in English),
  • followed by multiple list items (cf. Should single element lists be supported?),
  • which are usually all single-line and therefore constitute a non-nested, tight list,
  • followed by a blank line like there would have been after the paragraph without the list.
                  <!-- blank line -->
Introduction:
* List item
* List item
                  <!-- blank line -->

I wonder if it was enough to require two or more of these characteristics.

I also wonder whether such lists should be a part (i.e. a child node) of the paragraph if the output format supports that (HTML does not).

Btw.: The introduction line is usually not a list caption or list heading, so it wouldn’t help to have markup for that. Elsewhere it would (e.g. 7 #).

In Markdown, readability is emphasised above all else. If CommonMark is to be true to the philosophy of Markdown, readability should take priority over the principle of uniformity.

The use of whitespace can make a document more readable. Creating space around a list is arguably cleaner and makes that list stand out from the surrounding paragraphs (as do spaces between list and header markers, as discussed in depth in this topic).

I would be in favour of making the syntax a little harder (stricter) to write all around if that results in documents that are a little easier to read. By requiring the blank line, you also reduce the amount of variations within a document, making the whole feel more uniform.

2 Likes
This paragraph is not a direct child of a list item.
1. So this wouldn't be a list.

> Neither is this.
  1. So, not a list.

* But this paragraph is a direct child of a list item.
  1. So this would be a list.

*   > What about paragraphs that are descendents, but not direct children, of list items,
    > like this (which is a child of a block quote which is in turn a child of a list item)?
    > 1. Should this be allowed to start a list?

I think it’s best to only allow lists to interrupt paragraphs that are direct children of list items.

I worry that building in complex heuristics like this makes it too hard for writers to predict what will happen, and will lead to unexpected and surprising results.

2 Likes

Note that the suggested change would also largely solve the problem about one-character setext header lines and blank list items.

Searching through a large corpus of markdown documents, I have found many people using one-character setext header lines, so requiring 2+ characters may not be a good option.

I don’t see any way to make the change I suggested above without radically rethinking the way the spec is written for list items and other block constructions. The spec relies on the principle of uniformity: it says that, if some lines constitute blocks Bs, then the result of adding a list marker and indenting appropriately gives you a list item containing blocks Bs. So, if a blank line were required outside list items, it would be required inside list items as well, and you couldn’t have sublists like:

1. one
    - two

So I have made the following, compromise change, which I think addresses at least the biggest problem. An ordered list item can interrupt a paragraph (i.e. occur where a paragraph continuation line could normally be found, with no preceding blank line) only if it starts with 1. or 1). Thus,

The Captain died in
1868.  He was buried in...

does not create a list. But

Our top priorities are
1. fix ordered lists

does.

This fix does not help with the confusion about one-character setext heading lines and empty bullet lists. But I would propose that this be solved in a similar way, by stipulating that a bullet list item can only interrupt a paragraph if it is followed by some non-blank content. [EDIT: now implemented.]

4 Likes

I like this change. I find that one of the absolute biggest errors I see people making is

starting lists

  1. immediately under a para

… which of course means they don’t become proper lists. :anguished: So I am very much a fan of this solution, it solves a very real problem I see almost every day with users “just typing normally” and expecting it to work.

2 Likes

The 1. vs non 1. list starts are not intuitive at all when it comes to sub lists.

2. item 1
1. item 2
    1. item 2.1
1. item 3
    2. item 3.1

The third item’s sub item starts with 2. and will not be a sub-item, but part of the item 3 text in commonmark:

<ol start="2">
    <li>item 1</li>
    <li>item 2
        <ol>
            <li>item 2.1</li>
        </ol>
    </li>
    <li>item 3 2. item 3.1</li>
</ol>

Meanwhile the expectation and how other implementations treat it is as a sub-item BabelMark2 example.

This rule of only ordered items numbered 1. can break paragraphs, only partially addresses breaking a paragraph by an ordered list and happens to break compatibility of sub-lists starting without a blank line in parent items unless the sub-list numbering starts with 1. I think the latter is very unintuitive. It effectively means that if you want an ordered list item without a blank line before you must start it with 1., everywhere.

I am against lists breaking regular paragraphs because this can happen inadvertently when a paragraph is wrapped and would make list items not break paragraphs unless that paragraph is another list item’s first paragraph, i.e. the one corresponding to the item’s text. No other restrictions.

This would eliminate any inadvertent lists (of any type or numbering) in paragraphs to match other markdown implementations and allow any numbered ordered sub-list to be the first item. The only cost of that is a blank line between a paragraph and a list. This is nothing new and most markdown users already expects this, unless they are used to GFM only. It is also a much cleaner solution than either escaping potential list starts from paragraphs or having to scan your document just in case an inadvertent list item was injected into the document. All that just to avoid a blank line between a list and a paragraph. Which most other processors require: Babelmark2

I was caught by this just now. I was not getting the sub-item as expected only to realize after wasting a lot of time debugging code that it was numbered 2. instead of 1. This is completely outside what is expected from Markdown sub lists and very unintuitive, IMO.

1 Like

In classic Markdown, lists can’t start with anything other than 1. anyway – you could use a list starting with the number 5 and it’d still be 1 in the rendered output. Like so:

5. five
3. three
    6. six
    2. two
29. twenty-nine
  1. five
  2. three
    6. six
    2. two
  3. twenty-nine

I am unclear why it is so critical that sub-list numbering have to start with the number two, when the rendered output would be one, anyhow?

Exactly, in classic markdown it does not matter what the list numbering starts with, it is always 1. It is very common to move list items around without renumbering them.

With the current rules anything other than 1. will not be a sub-item but a continuation of the parent item’s text, unless you add a blank line before it. So moving list items around requires making sure the first one is 1 or has a blank line.

Very un-intuitive especially when it looks like a sub-item but is treated as lazy continuation of the parent item.

1 Like

I don’t like the hack giving special treatment to 1. either, and I’d like to remove it. Here is the difficulty. Most Markdown implementations that require a blank line before a list starts only do this at the outer level. Thus, in

paragraph A
1.  list item

1.  paragraph B
    1. list item

the list item is allowed to break paragraph A (which is not itself in a list item), but not paragraph B (which is contained in a list item). BabelMark2 example

The way the current CommonMark spec is constructed, it’s not possible to build in this kind of context-sensitivity. The rule for list items has the form: if you’ve got some lines that constitute a sequence of blocks, then the result of indenting them and adding a list marker is a list item containing these same blocks.

That implies that the two cases above must be treated the same way. The contents of the list item should be exactly what you get if you take off the list marker and deindent the lines. So, either the blank line must be required both places, or neither.

I’d actually be happy to require it both places, but this would be more revisionary and break more existing documents.

3 Likes

It looks as if the hack of treating ordered list markers differently in some contexts is harder than I thought to integrate with the parsing strategy in our reference implementations: see https://github.com/jgm/cmark/issues/204

I’m not sure, actually, how to handle this, but it may be that this decision needs to be rethought. (It always seemed ugly.)

I feel like this matters because it is arguably one of the things normal people mess up constantly in Markdown, and I mean literally daily:

blah blah blah
1. list item one
2. list item two

Then I have to go in and constantly fix their markup:

blah blah blah

1. list item one
2. list item two

However, if adding this rule is irreconcilable and impossible given the weird unspecified state of classic markdown, adding more irreconcilable weirdness is probably not a great idea.

So if we’ve definitively come down on the side of this is an unsolvable problem then we should just revert to the old behavior which demands and requires that there is a blank line above a list… so I’m OK with that, if you feel we’ve exhausted all avenues here @jgm.

One solution that can work is for block quote to be logically extended until a blank line. This way any lines not separated by a blank line will be treated as continuation lines or new block elements but still in the block quote. Block quote termination requires a blank line.

I added this option in my implementation for compatibility with other markdown processors and found it to be more intuitive when it comes to handling block quote continuation lines and new element lines.

This would have:

> This is _a_ paragraph continuation text
> 2. because the line starts with `2`, not `1`.
> This is _a_ paragraph continuation text
2. because the line starts with `2`, not `1`.

both result in:

This is a paragraph continuation text 2. because the line starts with 2, not 1.

and

> This is _a_ paragraph continuation text
> 1. because the line starts with `1`, not `2`.
> This is _a_ paragraph continuation text
1. because the line starts with `1`, not `2`.

both result in:

This is a paragraph continuation text

  1. because the line starts with 1, not 2.

As for ugliness of 1 vs non-1 list starts, I don’t think you can find a solution that everyone will like.

The solutions suggested so far seem to deal with wrapping issues in paragraphs, but not in lists.

Consider this example:

This sentence has been wrapped by
1. This is not a new list item.
1. An actual new list item.
2. Another new list item.

As already noted, this ambiguity can be solved by requiring a blank line before the list.

Now consider the case where the opening paragraph is itself a list item at the same level as the other list items:

1. This sentence has been wrapped by
1. This is not a separate list item.
2. An actual new list item.
3. Another new list item.

Solving this ambiguity requires a blank line before every new list item, not just the first item in the list, or even the first item at each new sublist level, like this:

1. This sentence has been wrapped by
1. This is not a separate list item.

2. An actual new list item.

3. Another new list item.

Perhaps the choice should be between:

  1. Always requiring a blank line before every new item in a list, including the first one.
    • This obeys the Principle of Uniformity and avoids ambiguity with line wrapping, setex, etc.
    • However, it is clearly a major change to current practise, and interferes with the definition of loose lists.
  2. Never requiring a blank line before any new list item, not even the first one.
    • This obeys the Principle of Uniformity and is consistent with current practise.
    • However, it requires escape characters or special heuristics to resolve issues with wrapping and setex.

My preference would be to never require blank lines before list items, and to use heuristics to determine whether something is a new list item or a continuation of the previous paragraph/list item. I know you guys are (understandably) reluctant to use heuristics, but they do have these benefits:

  1. A human reading the text is essentially using heuristics to determine what is and is not a new list item. Commonmark would essentially just be codifying this process.
  2. The heuristics don’t have to be perfect and can be improved over time as more edge cases are revealed.
  3. If there are exceptional cases where the heuristics get it wrong people can always use blank lines and escaping to get the result they intended.
  4. Heuristics can obey the Principal of Uniformity. The same heuristics can be applied at any sublist level to determine whether a new line is:
    • A) a continuation of the previous line.
    • B) a new list item at the same level of the previous item.
    • C) a child of the previous list item (i.e. a new sublist item)
1 Like

The current hack for 1. is causing problems and confusion. It should be replaced by a better solution. Several possibilities:

  1. Allow any single-digit list marker. (Still a hack, but covers more cases, including most of #2704.)
  2. Make the colon : at the end of a line (optionally followed by a single space) an indicator for special treatment of the next line, similar to double space and backslash \ for hard line breaks. This could be reused (by future extensions) for other things, e.g. blockquote attributions and table or figure captions, but it is a more severe deviation from Gruber Markdown.
  3. Require at least two (tight) list items (of the same type).
  4. Do not support loose lists to interrupt paragraphs.
  5. Allow a single-line paragraph (preceded by a blank line) with less than 80 characters to precede a list without the otherwise mandatory intervening blank line.

This is similar but not identical to heuristics I have suggested previously. I’m proposing to choose one of these or a mandatory combination of them, not multiple alternative options.

I really like the rules (3) and (4), however I would limit the rule (3) only to the numbered list items. The same rules should apply in nested lists (uniformity) and IMHO it’s quite common to have single-item bullet lists, especially when nested in a tight list:

* foo
  * subitem of foo, not continuation line
* bar

I find (1) and (2) too hackish and incompatible with current practice. Also not sure how e.g. : is used in other languages like e.g. Chinese, Japanese, Arabic.

The rule (5) looks very strange to me: A user who adds a word somewhere in the middle or beginning of the previous sentence can cause inadequately large rendering change elsewhere, after the sentence. IMHO very unintuitive behavior.

But, if we find (3) and (4) are not satisfactory, it might make sense to me if the 1st list item (not preceding line) is required to be quite short in order to interrupt the previous paragraph: Consider that tight lists tend to have short item contents. And if they contain longer text, author may want to place a blank line to visually split it from the preceded text or use loose list right away and do it intuitively.

2 Likes

I like 2+4 slightly better than 3+4, for what it’s worth.

It seems sensible to require at least two numbered list items.

How many situations are there in which someone actually wants to 
interrupt a paragraph with a single-item numbered list? Probably
1. Or maybe a couple more. Other than that, a numbered list would
be created by happenstance, without intending or expecting it.

See what I did there? :wink: Babelmark

1 Like