The CommonMark spec uses a declarative style to describe what the syntax for each functionality is, with additional details and corner cases explained by examples. This does not appear to be a good design for a Markdown spec.
A lot of the ambiguities in parsing Markdown are about how to handle interleaving of syntax constructs, and the spec is silent on many of these aspects, so the spec is ambiguous.
If a line like
[ref]: /url) is followed by a setext underline, is that a header, or is that the start of a fenced code block (or ref definition)? Looking at the setext header section, it appears that it should be interpreted as a setext header. Looking at the fenced code block section (or the link ref defn. section), it appears that it should be interpreted as a fenced code block start (or a link ref defn.).
Similarly, if a list is immediately followed by a setext underline, the spec is ambiguous about whether the last line of the list should be a setext underline or a list. If we look at the list section, it appears that it should be interpreted as a list, and if we look at the setext header section, it appears that it should be interpreted as a setext header.
* One * Two ---
Extending (2) above, it the setext underline is indented to match the list indentation, and if the indentation is less than 3 spaces, it still matches the description of a top-level setext header as given in the setext header section.
* One * Two ---
Going by what is said in the raw HTML tag and the code span sections, it appears that a HTML tag enclosed in backticks shouldn’t be interpreted as a code-span, which is obviously wrong.
As a pedantic example, it is unclear how
_foo *bar_ baz*should be parsed. Based on the descriptions in the spec, it could be parsed either emphasizing
foo *baror emphasizing
The spec does resolve some of these interleaved constructs in examples, and the above examples can of course be similarly mentioned as examples in the spec, but there will always be potential corner cases like this, so we can never be sure that the spec is totally unambiguous.
So, just like John Gruber’s original Markdown spec, the CommonMark spec has ambiguities, and it looks unlikely that it becomes totally unambiguous as long as it sticks to the declarative style.
Not the best fit for parser developers
A fallout of this example-based resolving of corner cases is that information on parsing a particular construct is not restricted to one part of the spec and is spread out. To understand how to handle a particular construct in a parser, a parser-developer can not restrict herself just reading that section in the spec, and might have to look out for examples involving that construct that might be placed anywhere in the spec. Even if she spots all those examples, it is not always apparent what strategy should be followed so that the resulting parser’s behaviour is consistent with all the provided examples.
At things stand now, it appears that a parser developer who wants to write a compliant Markdown parser is better off using one of the CommonMark implementations as a reference point rather than the specification document. Note that this was precisely the situation earlier as well (many Markdown parsers are adaptations of an implementation to another programming language).
Not the best fit for document writers
It appears that one of the reasons for the declarative style of the spec (as opposed to a algorithm-based/state-machine style), was because the declarative style was “closer to the way a human reader or writer would think, as opposed to a computer”.
While the declarative style itself is a good fit for document writers, the multitude of examples used to resolve the corner cases can make it unnecessarily complicated for that audience. Making a readable specification for document-writers and making an unambiguous specification for parser-developers are opposing objectives. The document writer asks “What should I do to get a heading?”, while a parser developer asks “How should I interpret a line starting with a hash?”. So it’s better to have a different document explaining the syntax for document writers.
The declarative style of the spec:
- makes it ambiguous (even though one of the goals of the project is an unambiguous spec)
- makes it hard for a parser-developer to use the spec as a reference (even though one of the goals of the project is to make Markdown easier to parse)
I suggest that:
- there be separate documents targeting (a) document writers, and (b) parser developers
- the spec meant for parser developers should not be in the declarative style (algorithm-based/state-machine style is good)