0.12 changes to lists and images

jgm · November 10, 2014, 7:11pm

The newly released version 0.12 of the spec makes some changes to list and image syntax. I want to summarize them here.

Link text is now conceptually distinguished from link labels. We now use the terminology “link label” only for a reference-link label. So, in [foo][bar], the [foo] is the link text and the [bar] is the link label. In the forms [foo] and [foo][], the link label is derived from the link text.
For images, we use the term “image description” instead of “link text.” The spec does not enforce any particular way of dealing with the image description in a renderer (since the spec is really about parsing, not rendering), but it recommends rendering the plain text content of an image description as the contents of the alt attribute.
Link text cannot contain links (at any level of nesting), but can contain images. Image descriptions can contain both links and images.
The syntax of link labels is more restrictive than before. Previously you could have arbitrary nested pairs of brackets in link labels, and they could be any length. Now they must be less than 1000 characters, and cannot contain unescaped brackets. In addition, a link label ends with the first unescaped right bracket, so the following is a valid link label: [foo`]`.

(1-2) are terminological. (3) is motivated by the fact that links inside link text make no sense in any format we might render to (they are invalid in HTML). Links inside image descriptions (and even inline images) might make sense if the description is used as an image caption (as it is for links in paragraphs by themselves, in some Markdown extensions). When the description is used as the image’s alt text, a plain-text version of the embedded link or image can be substituted.

(4) is motivated partly by concerns about parsing efficiency (but it also brings CommonMark closer to the majority of existing implementations). The link labels are designed as tags, and should not be able to contain arbitrary markup. I do not think the 1000 character limit is a big restriction: it is hard to imagine anyone wanting a reference 1000 characters long. This change does prevent you from doing

[[foo]]: url

[[foo]]

but you can still do

[foo]: url

[[foo]][foo]

or

[\[foo\]]

[\[foo\]]: url

The new C parser is much more robust than the old one, which could be made to stack overflow with deeply nested structures (e.g. one million nested deep emphasis or one million balanced square brackets). The new parser handles all of this in roughly linear time without recursion (and hence without blowing the stack). I have not rewritten the JS parser to avoid recursion, so it still will overflow the stack on deeply nested structures, but since the JS runtime handles this gracefully, it’s not such an issue.