Requiring a blank line between a paragraph and list

Burt_Harris · September 20, 2014, 6:06pm

Continuing the discussion from CommonMark Danger: Markdown Extensions w/o Flavors:

In initial discussions leading up to the spec, the issue of whether to require a blank line between a paragraph and a following list did come up. I favored requiring it, both to increase backwards compatibility and because of a symmetry argument. (Because of “laziness”, you need a blank line between a list and a following paragraph, so by symmetry it makes sense to require a blank line between a paragraph and a following list.) Others argued that users are likely to create lists without the intervening blank line; this is intuitive and would strike any reader as obviously a paragraph followed by a list:
    My shopping list:
    - eggs
    - butter
    - bread
There are arguments on both sides, and it’s definitely an issue that could be reconsidered.

Burt_Harris · September 20, 2014, 6:17pm

I think you were right to favor requiring it.

While I agree with the intuitiveness of that notation, and agree that other lightweight languages (like YAML) see it that way, Babelmark 2 clearly doesn’t show that interpretation has any degree of commonality between markdown flavors.

It strikes me that punctuation used in combination is a more clear indication of intent than whitespace. So I note the colon you used in that example, and suggest that lines ending in a colon might be considered a special contextual maker, that could alter the interpretation of subsequent lines. Of course a blank line should work too. Let’s explore that in the context of my earlier example:

## The Quadratic Equation ##
A __quadratic equation__ is any equation having the form:

>  *ax*<sup>2</sup>
>  + *bx*
>  + *c*
>  = 0

where *a, b,* and *c* represent numbers, 
such that *a* is not equal to 0.

What happens if I remove the first blank line?

IMHO pandoc 1.13.1 and RedCarpet 2.1.1 misread my intent. The colon on line two might give a processor a hint that the blockquote following should be respected without a blank line.

P.S. this also seems to reveal a possible bug in Babelmark2, as the preview view seems broken.

codinghorror · September 22, 2014, 10:03am

It is “babel” as in

codinghorror · September 22, 2014, 10:08am

Considering the cited example, Babelmark says:

4 implementations interpret as para + list
19 implementations interpret as para only

I think the data says most Markdown implementations do in fact require the line between.

Burt_Harris · September 23, 2014, 3:37am

I’d be happy to but frankly I wasn’t sure how/where I misspelled it. I was looking, but didn’t see it, but now I gather you mean s/Babble/Babbel/g I think I’ve fixed it now, everwhere but your in quote of my statement.

codinghorror · September 23, 2014, 7:56am

Can you please edit your posts to spell “Babel” correctly? There’s a reason I posted the wikipedia article above.

ComplexPoint · September 23, 2014, 9:19am

Supply-side stats like those in the Babelmark table are, of course, easier to obtain, but the demand-side figures would be more useful in the design of high quality tools for users.

It would be interesting to know:

What proportion of edit events involve insertion of a line first time ?
and what proportion involve subsequent editing to add a further line ?

A graphical model weighted with that kind of data would allow more intelligent behaviour than a crude normative rule-base.

PeterNLewis · September 23, 2014, 9:41am

Speaking just as a user, and having posted this as a bug in Discourse, I can say it is really annoying to require a blank line in this context. Its very clear what the user intention is if they write:

These three things:
* one
* two
* three

Having that produce anything other than a list is a bug IMNVHO. It also is substantially ugly in the text to add a blank line there, and since the text may be being displayed as plain text in other formats (eg plain text email), this is definitely a problem.

The fact that nineteen implementations require it most likely says more about the testing of those implementations than any real intent.

mofosyne · September 23, 2014, 11:36am

Hmmmmmm… wonder if : at the end of a line always signify ‘next paragraph’ or list. And if so, can there ever be a situation where it is still a new paragraph/list/etc… without : at end of line.

mcwumbly · September 23, 2014, 1:25pm

Yep, this is the most common mistake I see people making who are new to markdown as well.

I have never noticed it implemented any other way in practice, so I’ve gotten used to it.

The topic linked above has a number of legitimate reasons not to consider line breaks as new paragraphs by default in general, so its worth a read.

If lists could be special without any major pitfalls I think it would better meet user expectations.

especially when the "treat line breaks as paragraphs option is used"
like
this.

Burt_Harris · September 23, 2014, 4:48pm

Thanks Peter, I absolutely understand this, and value a user point of view. But in doing language design, I think it’s important we establish a clear priority between two potential goals for CommonMark, that may frequently be in conflict moving forward if we do not address them now:

The goal to be “as compatible as possible”
The goal to resolve minor annoyances and/or frequent mistakes.

I believe that both these goals can be accommodated, but only thorough formalization of the concept of flavors and versions. These concepts may be a little fuzzy because they are not yet well defined in the CommonMark context, but I don’t think they will be onerous to implement or use.

But right now, without flavors and versions I believe we should put design-for-compatibility above design-by-annoyance, or risk CommonMark becoming just another YAMF and making the real problem that lead to CommonMark’s creation (the lack of user and document portability between platforms) worse.

Today, CommonMark is not final, and getting from here to there may look bit like making sausage. To people who fall closer to the user role than the implementer role, some of this process needed to refine the specification may seem unnecessarily ugly, but I hope you’ll trust me when I say that “clear intent” becomes anything but clear when attempting to parse natural languages in production quality code.

ComplexPoint · September 23, 2014, 6:57pm

When the domain of discourse is this constrained (a few elements of document structure), and the set of tokens this small (newlines, prefixes, inline emphases etc), the challenge is really not great, especially if some data on user habits has been collected.

jgm · September 24, 2014, 4:25pm

It is true that most implementations require the blank line. (And, although the requirement is not in John Gruber’s syntax description, this issue was explicitly discussed; see the links here for the relevant history.)

There is a serious issue here about how heavily backwards compatibility should weigh. I must say, it would weigh much more heavily for me if there were a chance of avoiding fragmentation. But John Gruber has made it pretty clear that there is not. (I did ask him whether he could get behind this effort if we removed divergences like this, but there was no uptake.) If we are to be working on CommonMark instead of Markdown, then perhaps it is a good opportunity to rethink some things that may have been ill advised in the original.

The first observation to make is that, no matter which way we go on this, there will be inputs that will not behave as their writers intended. However, I think these will be much more common if we do require the blank line than if we do not. Burt_Harris’s original example is not as ordinary a piece of text as, for example:

TODO:
- fix bugs
- write documentation

Either way we go, there will be some inputs where manual escaping is necessary to get the intended interpretation.

Now, it seems to me that there is a strong reason for NOT requiring the blank lines. Even in Gruber’s Markdown, blank lines are not required inside a list item. So, this gives you a list item containing a paragraph followed by a bullet list:

1.  Foo
    - Bar
    - Baz

CommonMark uses the principle that the meaning of a block of text should not change when it is put into a list item. It follows, then, that this should be a paragraph followed by a bullet list:

Foo
- Bar
- Baz

(Indeed, the way the spec for list items is written requires that this property hold.) I think this is the least surprising behavior for users. Hence, I conclude, either we should require the blank line between a paragraph and a list everywhere – even in sublist contexts (which is what reStructuredText does) – or we should require it nowhere.

codinghorror · September 24, 2014, 5:55pm

Based on this discussion, and the cited examples/data, I am persuaded that we should support requiring a blank line nowhere.

That is, I think

para

foo
bar

should work as is (e.g. that should be a <ul> list, above)

Burt_Harris · September 25, 2014, 2:01am

I’m not sure I follow your logic here, @jgm. I’m also not sure you’ve followed my reasoning on the other thread. I’ll summarize it here:

Yet, even after the renaming, the web site continues to prominently display the former phrase, despite the fact it’s a clear violation of the third bulleted clause of the Markdown license. In my mind, it is quite possible that this issue, one of lack of respect for the terms of the license, and not issues of compatibility or fragmentation that have blocked John Gruber’s “getting behind” this effort, and continue to do so.

But in my mind, the chance of avoiding (further) fragmentation has nothing whatsoever with getting John Gruber’s support or agreement. Gruber has held the Mardown.js implementation’s syntax stable for a long time, thus he’s not the one causing fragmentation.

The fragmentation problem is also independent of naming. It is caused by people deciding to do things differently in their own implementations, or lacking the skill and discipline to get it right. I trust it’s not the later with you.

Yes, I admit that. So lets use your example or @codinghorror’s example and view them into Babelmark 2’s Preview view (rather than worrying about the exact HTML generated.) This much simpler example generates only two visual representations:

Markdown.pl (the only Markdown implementation), and 17-other YAMF implementations generate the one that looks like this:

TODO: - fix bugs - write documentation

While only 4 implementations display it as a bulleted list, including stmd 0.1 which I suggest shouldn’t be counted. So if you really like that 18:3 minority opinion, I’d suggest renaming this whole effort YAMF as others have suggested.

The problem with oversimplifying test cases

I understand why you simplified my test case, but I think you went too far removing the non-blank line before the start of the when you started to talk about nested lists. What if you restore it…

Is the following a list or part of this paragraph?
1.  Foo
    - Bar
    - Baz

Plugging that into Babelmark 2, the range of variations is greater, but most of the implementations I would consider high-quality (including Markdown.pl and Pandoc) display results that don’t format any of it list items (nested or not). stmd is (again) almost unique in its interpretation,

So now trying to following your logic about the “principal” of CommonMark, I remove the first four characters of all but the first line, and get back to something that behaves just like the TODO list one.

jgm · September 25, 2014, 3:41am

About the test case, my point is that the behavior you find objectionable in stmd in reproduced in all markdown implementations in list contexts. Try this in BabelMark2:

1.  *ax*<sup>2</sup>
    + *bx*
    + *c*
    = 0

stmd is more consistent, in that the content is interpreted the same way in a list context and outside of one. That seems to me an important feature, from the point of view of user surprise. It also permits a clean description of list item syntax.

Note also that Markdown.pl does allow block quotes to start without an intervening blank line. Why are blockquotes treated differently from lists? Try this in BabelMark2:

*ax*<sup>2</sup>
+ *bx*
+ *c*
> 0 - *ax*

It’s very similar to your original example, but you won’t find Markdown.pl doing what you intended here.

The fact is that cases like this, where ordinary text lines begin with + or - or >, are fairly rare. Making sure they behave the same way as with Markdown.pl is not, in my view, worth the cost in simplicity and coherence – not to mention the myriad realistic cases like the “TODO” example where stmd will render according to the writer’s intent and Markdown.pl will not.

And yes, Gruber has caused fragmentation. For a decade now, implementers have come to his markdown-discuss list asking for guidance on corner cases and bigger issues, and he has responded with silence. That is a major reason why there are divergences of the sort displayed by BabelMark2. If he had exhibited leadership, saying, in response to our queries, “here’s the precise rule for indented list nesting,” and clarifying his syntax description to make it clear, then there would have been no need for this effort.

mofosyne · September 25, 2014, 4:25am

Perhaps the philosophy of commonmark should be less about what every other parser is doing.

And more about, what would be the Common Expectation of users when writing markdown.

In other words. The aim should be more about a Lightweight Markup Language of No Surprises^{TM^.}

This is because the biggest attraction that made iphone beat most other more feature rich competitors, is that it ‘just works’ from the average user experience.

tl;dr: Be common in user expectations. Not in the Status Quo

Burt_Harris · September 25, 2014, 3:04pm

Well said. I wish that that could be true. Unfortunately, surprises have too much to do with context and previous experience. For example, with people saying things like:

For those who’s context includes using (and accepting) Markdown, changing the choices that Gruber made is surprising. For others, (perhaps newbies or those with a YAML background) the interpretation of Markdown is surprising. (Personally, I have no problem with Markdown’s interpretation this because I keep in mind that YAML Ain’t Markup Language.)

ComplexPoint · September 25, 2014, 4:25pm

Unless Gruber is the author or principal actor of Genesis 11, that seems a remarkably implausible hypothesis to me.

Divergence in requirements and applications is enough to account for the most significant divergences of idiom, and more social processes of distancing and identification explain much of the rest.

The three light markup applications which I use most would all be broken by the adoption of the current CommonMark. Not because the Ur-spec was too loose, but because they have been shaped by more specialised needs, like nested tables (MMD) and outlining (FoldingText).

Spanish is gaining demographic ground on English now in North America. Not, I suspect, because English grammar lacks a formal specification … (or do you think the tide could be turned by a better grammar book ?)

Babel will always be with us. Technical specialisation, and in-group out-group identification and distancing are the perennial drivers of divergence. Throwing verbal stones towards Gruber (and deprecating features of his dialect) is the very essence of the Babel story at work …

Jack_Douglas · July 10, 2015, 6:53pm

A (now-fixed) bug report I just filed on the spec makes me wonder whether you’ve come down on the wrong side here. I think the choice of examples on this post are weighted towards the decision you made, but should this really be interpreted as a list inside a paragraph:

[Hexadecimal entities](@hexadecimal-entities)

consist of `&#` + either `X` or `x` + a string of 1-8 hexadecimal digits
+ `;`. They will also be parsed and turned into the corresponding
unicode codepoints in the AST.

Perhaps a special case needs to be made for ‘list item(s)’ inside a paragraph that don’t look like lists, eg don’t have any further indentation and aren’t followed by a blank line, but on the other hand, perhaps that adds too much complexity. Either way I think it may be worth reconsidering.