Block Quotes, Laziness, and Link Reference Definitions

jackdw · January 15, 2021, 4:03am

Okay guys, this one is bugging me, and I am hoping that I have not (yet again) not read something correctly in the specification. The Markdown in question is:

> [foo]:
/url

Now, the “dingus” says this should be:

<blockquote>
</blockquote>

And I have been trying to figure out how. Mind you, some of it was during a power outage that we had here in the PNW, but I had the spec on my laptop, and just put my mind to figuring it out.

The Block Quote element I am in total agreement with. If the second line started with > or > I would also be in agreement with it. But in the opening part for Block Quotes:

Laziness. If a string of lines Ls constitute a block quote with contents Bs , then the result of deleting the initial block quote marker from one or more lines in which the next non-whitespace character after the block quote marker is paragraph continuation text is a block quote with Bs as its content. Paragraph continuation text is text that will be parsed as part of the content of a paragraph, but does not occur at the beginning of the paragraph.

Doesn’t this imply that this should only occur with the continuation of a Paragraph element? The way I read it, if a Paragraph element hasn’t been opened, it cannot be parsed as part of the paragraph.

Thanks,
Jack

mity · January 15, 2021, 1:31pm

I don’t know how Dingus implements this.

But many implementations right now (including Cmark as the reference one) parse link reference definitions specially only later after the normal block analysis. I.e. they parse them initially as a part of paragraphs and only later they rip them off from beginnings of the paragraphs (and paragraphs becoming empty are discarded completely, so that <p></p> is not emited).

It makes the implementation much easier in several aspects:

It’s easier because you cannot decide whether you encounter the link ref. def. just from its first line. The link ref. def. can occupy many consecutive lines. And some look ahead of an arbitrary length in the block pass could be dangerous in order to prevent a O(n^2) behavior.
You can share a lot of code with inline analysis (recognition of many link components like destination or title) because you already have the same data structures prepared and initialized. I.e. it makes a smaller space for other bugs (like inconsistency in recognition of link destination or titles between inline links and the reference definitions).
It solves some complicated cases. Especially imagine multi-line link reference definitions inside a container block (a list item or a quote block). You don’t want the link reference definition parser to be confused by the marks or indentations which belongs to the enclosing container syntax. The output of the block analysis does this for you for free.

For example:
```
>>> 1. [link. ref. def inside a list item, nested in a block quote]:
>>>     /with_a_destination
>>>     `And with a multiline
>>>     title`
```

Strictly speaking, I agree with you, as of the current spec. wording, this approach is likely incorrect in respect to some corner cases like the one you have come up with. From the point of view of the specification the link. ref. def. is just a normal block, and the spec never allows the lazy continuation line for anything but a paragraph.

On the other hand, intuitively, imho it’s good that the lazy continuation can work here too. It makes a good sense to me. Paragraphs may follow the definition without any blank line. And it sounds funny to me to require that e.g. a lazy continuation lines begins to work only later so you may need to write this:

> [foo]: 
> /destination
> `some multiline
> title`
> But now a paragraph following the link. ref. def. can
use a lazy continuation line. Strange.

@jgm But of course, the discrepancy between the specification and the implementation should be fixed.

jgm · January 15, 2021, 6:43pm

There’s some discussion of just this issue at

I think this is a case where the reference implementation
isn’t quite matching the spec due to a tempting optimization.

jackdw · January 16, 2021, 4:46am

Fair enough, and I can confirm that the commonmark.js 0.29.2 version returns that behavior for both list blocks and for block quotes.

And I agree that it is a tempting optimization. Real easy. And it does make parsing simpler to a certain extent.

But something inside of my head screams “slippery” slope. Either the LRD is an element on its own, or it is not. If it is it’s own element, then as the specification stands, the reference implementation is non-compliant.

To fix that would require a change somewhere in the specification, but that has its own problems. Would some of the existing examples change due to the new wording?

Thoughts?

jgm · January 16, 2021, 7:40pm

Fair enough, and I can confirm that the commonmark.js 0.29.2 version returns that behavior for both list blocks and for block quotes.

And I agree that it is a tempting optimization. Real easy. And it does make parsing simpler to a certain extent.

But something inside of my head screams “slippery” slope. Either the LRD is an element on its own, or it is not. If it is it’s own element, then as the specification stands, the reference implementation is non-compliant.

To fix that would require a change somewhere in the specification, but that has its own problems. Would some of the existing examples change due to the new wording?

I agree, the situation isn’t ideal. That’s why I’ve left that issue open. (You can comment further there if you like.) I’m not sure precisely what should be done, but as this sort of thing doesn’t come up very much in practice, it hasn’t been urgent to resolve it.