CommonMark Formal Grammar

gagern · January 19, 2016, 3:36pm

What’s the status here? Does anyone currently work on a formal grammar? Has anyone tried and given up? If so, where was the main problem? Is there a more recent thread on this?

@tin-pot mentioned (in Enumerated lists without explicit number, ATX headings with explicit number) some unpublished material related to a grammar. I’d like to hear more about that as well.

To me it looks as if the “match number of backticks resp. tildes, no matter how many there are” specification of a fenced code block would be particularly difficult to express. Looking at the Wikipedia category “Formal Languages” for likely candidates, I found indexed grammars to be the most promising. One could use the index stack to count various kinds of things, most notably the number of characters in a code fence, or the number of spaces required to line up a list item. Counting two things in parallel (i.e. spaces and fence characters) would already be tricky, I think, so the translation would be far from obvious.

So far I know very little about indexed grammars. In particular I know of no tools making use of these, and I know of no methods to decide whether a given index grammar is ambiguous. So this may need some more research. If someone found a way to make standard context free grammars or something close to that work for CommonMark, that would of course be great, but I have my doubts.

In a comment on Vanilla-flavored Markdown as basis for state machine spec, @riking conjectured that CommonMark would have to be Chomsky 0. If that were indeed the case, indexed grammars would be bound to fail as well, since they are only Chomsky 1. However, I don’t yet see the argument why this has to be Chomsky 0.

For the sake of completeness: Indian parallel grammars sounded like a good candidate for CommonMark as well. The parallel rewriting of non-terminals can be used to insert matching fences, or matching indentation at the beginning of lines. I can’t see a way to forbid longer runs of backticks inside a fenced code block using this formalism, though. I know of no approach which would combine such parallelism with conjunctions.