Maintaining sanity in lists with different kinds of spacing

skyzyx · September 8, 2014, 12:34am

A fork of the conversation from You guys failed to made the list sane · Issue #82 · commonmark/commonmark-spec · GitHub. I’ve attempted to pull the most salient bits into this thread, but being a new user, Discourse is being dumb. Links and usernames have been edited to be less linky.

[skyzyx]
As per the original Markdown description¹:

It looks nice if you indent every line of the subsequent paragraphs, but here again, Markdown will allow you to be lazy:
*   This is a list item with two paragraphs.

    This is the second paragraph in the list item. You're
only required to indent the first line. Lorem ipsum dolor
sit amet, consectetuer adipiscing elit.

*   Another item in the same list.

¹ Daring Fireball: Markdown Syntax Documentation

[bobef]
For example:

1. asd
2. qwe

3. zxc

It is completely insane to wrap the neighboring items in paragraphs because you put a paragraph in the second item.

[jumpwah]
Don’t think john (and many others, as skyzyx says) would accept removing the concept of loose lists to be honest. The spec already mentions two blank lines¹ to end all lists if you want two consecutive tight lists.

Many people also use loose lists for readability, since markdown is also about being pretty, as in visually easy to read. Tight lists might be too close together for longer types of lists.

¹ jgm.github.io/stmd/spec.html#191

[skyzyx]
Looking at Example 194¹ (which seems to be your specific case), here is the Markdown:

- a
- b

- c

Based on that Markdown input, I would expect to see the following HTML:

<ul>
    <li>a</li>
    <li>b</li>
    <li><p>c</p></li>
</ul>

For my brain, a line above an individual list item would imply a paragraph’d list item. But what about some other cases? How should they convert?

- a
- b

- c
- d
- e

- a
- b

- c

- d
- e

For me, the last Markdown example implies, clearly, that - c is paragraph’d. Not so sure about the second-to-last example, though.

Thoughts?

¹ jgm.github.io/stmd/spec.html#list-items

[etchalon]
I think a casual user would expect:

1. One
2. Two

1.Three

to produce:

<ol>
    <li>One</li>
    <li>Two</li>
</ol>

<ol>
    <li>Three</li>
</ol>

They might expect you to be clever about:

1. This is an example

Of a something with lazy continuations.

2. And some nice breaks

3. In certain places.

…and produce:

<ol>
    <li><p>This is an example</p><p>Of a something with lazy continuations.</p></li>
    <li>And some nice breaks</li>
    <li>In certain places.</li>
</ol>

…and even more clever about:

1. This is an example

Of a something with lazy continuations.

2. And some nice breaks

1. In certain places.

…which would produce:

<ol>
    <li><p>This is an example</p><p>Of a something with lazy continuations.</p></li>
    <li>And some nice breaks</li>
</ol>
<ol>
    <li>In certain places.</li>
</ol>

Meanwhile:

- a
- b

- c

- d
- e

…would, for me, produce:

<ul>
    <li>a</li>
    <li>b</li>
</ul>
<ul>
    <li>c</li>
</ul>

<ul>
    <li>d</li>
    <li>e</li>
</ul>

…and this:

- a
- b

 Hello, dolly.

- c

- d
- e

Would produce:

<ul>
    <li>a</li>
    <li><p>b</p><p>Hello, Dolly</p></li>
</ul>
<ul>
    <li>c</li>
</ul>
<ul>
    <li>d</li>
    <li>e</li>
</ul>

I realize this is against the original spec, but I always felt the original spec was drunk when it wrote this part of itself.

[skyzyx]

I think a casual user would expect:
1. One
2. Two

1.Three
to produce:
<ol>
    <li>One</li>
    <li>Two</li>
</ol>

<ol>
    <li>Three</li>
</ol>

Perhaps. This section always felt a little funky. I’ve learned that double-spacing after a list item, going straight into a new list item implies that the list items will be wrapped in paragraphs. But when the spacing gets weird, which side of the line does it fall on?

I suppose that with ordered lists, it could possibly be clearer than with unordered lists, but with long ordered lists, where you’re still writing and changing the order frequently (or doing some programmatic sorting), I’ve learned to rely on this crutch¹:

If you instead wrote the list in Markdown like this:

Bird
McHale
Parish

or even:

Bird
McHale
Parish

you’d get the exact same HTML output. The point is, if you want to, you can use ordinal numbers in your ordered Markdown lists, so that the numbers in your source match the numbers in your published HTML. But if you want to be lazy, you don’t have to.

If you do use lazy list numbering, however, you should still start the list with the number 1. At some point in the future, Markdown may support starting ordered lists at an arbitrary number.

Sometimes, my Markdown lists look like this:

1.  Bird
1.  McHale
1.  Parish

Because the reference implementation of Markdown was so loose with this kind of parsing, the laziness worked-ish. But seeing how widespread the adoption of Markdown has gotten in the last 10 years, maybe it’s time to tighten this up? Possible solutions:

The Markdown parser throws a warning when it comes across weirdness-inducing syntax.
We start taking list item numbering more seriously.
We introduce the concept of a strict list-termination control character.

These are just suggestions for how to deal with the weirdness discussed in this thread. We’d also need to weigh the consequences of possibly-unexpected additional strictness.

¹ Daring Fireball: Markdown Syntax Documentation

[bobef]
IMO in regards to numbering, the ordered lists should have two modes, the output would be identical:

1. asd
2. qwe
3. zxc

# asd
# qwe
# zxc

Specs-wise I would require real numbering 1,2,3, and forbid 1,1,1. But implementation wise I would ignore this and make 1,1,1 possible because parsing with regexes is so much easier this way.

[ConnorKrammer]
In my opinion, a fairly sane compromise would be to start a new list whenever the previous list contained a number that was greater than 1, and the next list starts with a 1.

This means that the following cases are all valid (pay attention to the numbers):

1. first list
2. first list
3. first list

1. second list
2. second list

3. second list

1. third list
2. third list
1. third list

1. fourth list
1. fourth list
1. fourth list

1. fourth list

This is intuitive, because a user that doesn’t care about list numbering would be able to just repeat a number over and over again, while a user that does care obviously means to start a new list when typing a newline and beginning at one again.

And in CommonMark, the first number of a list already has significance. If you start a list with the number three, for example, then that list really will start with a three. It doesn’t seem so contrived to me for the first number to take on another significance, i.e. granting it the ability to start a new list by starting at one again.

This actually meshes fairly well with how changing the bullet types starts a new list:

- first list
- first list

+ second list
+ second list

… without having to type two newlines.

jgm any thoughts?

[rlidwka]
Why invent those complex rules if you can just put up two newlines?

[ConnorKrammer]
It’s certainly true that it’s more complex, but not by much. The only thing I’ve added in my proposition is that if a list contains a number higher than one, and the next newline is followed by a new item that starts with a one, then start a new list. It could probably be written using a single boolean and one if statement.

And why do it? Because it’s far more intuitive to the user. The whole deal with Markdown is user readability, in a format not unlike what would be used in an email. Which is more likely: that in an email a user would separate lists with one newline, and start again at one, or that they would put two newlines in, and possible start at some arbitrary number?

[jumpwah]
My 2c:
If @etchalon’s proposals are accepted, then there is no way to do loose lists. That has backwards compatibility issues with original markdown, so I don’t think it’d be accepted.

Forbidding:

1.  Bird
1.  McHale
1.  Parish

in the spec also has backwards compatibility issues. And many already use this form because it’s easier to rearrange list items this way since the numbers don’t have to be readjusted. This is the same in html/latex where you use <li> tags (in an <ol> tag), or \item (in an enumerate environment), instead of the actual numbers. Although I do understand that markdown is about looks so having actual numbers does make more sense markdown-philosophy wise, having all 1.'s is more practical for many here.

I also don’t understand the proposed reason for forbidding such syntax? What’s wrong with simply using two blank lines as rlidwka says?

–

ConnorKrammer:

The only thing I’ve added in my proposition is that if a list contains a number higher than one, and the next newline is followed by a new item that starts with a one, then start a new list. It could probably be written using a single boolean and one if statement.

I believe this breaks loose ordered lists? This is simply fixed with two blank lines too.

[ConnorKrammer]
jumpwah I don’t believe that it would, since we’re only changing the behaviour in a very small subset of cases.

The following (using loose lists) should still work:

1. first
2. first

3. first

1. second
2. second

3. second

More formally:

Start a new list if:

If a numbered list is being parsed which contains a number greater than one;
A newline is then encountered followed by another numbered list item;
That numbered list item itself starts with a one

Of course, this is adding more complexity to a problem that’s already solved. As you’ve pointed out, everything I’ve described can be done equally well by simply typing two newlines. The real benefit is that in certain cases, such as my example, the HTML output would more closely match the Markdown input.

rwzy · September 8, 2014, 12:54am

Much thanks skyzyx for taking the effort to transport that thread here!

To ConnorKrammer from that thread:

I meant it would break:

1. first
1. second

1. third

1. fourth
1. fifth

right? So that would remove a very useful feature to have all ordered list items start with 1. from loose ordered lists… even though the solution already exists: by simply separating such consecutive different lists with two blank lines as already specified in the spec.

ConnorKrammer · September 8, 2014, 1:21am

I see the confusion – sorry for not being clearer. The answer is that it wouldn’t break that scenario, since the list never uses a number greater than one. Even if the rules I outline above were implemented, your example would still render as a single (loose) list.

jcracknell · September 18, 2014, 1:11am

So I am puttering away at my own (PEG-based) markdown implementation, and being very much in tune with the Markdown ideal of producing output which predictably resembles the input, I was all over this.

  * a
  * b
  
  * c

  * d
  * e

The above is very clearly three lists: a tight list containing items a and b, a loose list containing item c, and a tight list containing items d and e.

This behavior is easy enough to acheive - you make the rule for tight lists willing to “settle” - it will accept whatever items it can get, so essentially the space between the first and second item in the list determines its character. And that works great. Then you start looking at edge cases:

  * a
    
    b
  * c

So I guess this is a tight list followed by a code block and another list? Wait I know, we’ll allow the omission of the trailing blank line in a loose list item if it contains a continuation! Yeah!

  * a
  * b

    c
  * d

Yeah! And this is… uhhh… a tight list? With a single item? Followed by a loose list with two items - two lists!
No wait! Uhhh…

It is at this point you start to ask yourself important questions like:

When exactly are you going to need support for adjacent lists?
Who writes lists like this anyways?
Why am I trying to help them?

The current behavior will happily munge together anything even remotely resembling a list - it is basically as flexible as it can get. This has very real benefits in terms of the fragility of the list syntax. In terms of user expectations and in the context of real-time preview (as seen on this site), it is best if small edits to the input do not cause the structure of the document to vary wildly.

Given the very real questions surrounding the utility of a more rigid list syntax, I am currently mulling over whether or not such a change would actually be beneficial to the average user. The current behavior encodes the notion that adjacent lists with the same marker style are fundamentally ambiguous, which has a certain appeal.