CommonMark Formal Grammar

elliottslaughter · September 5, 2014, 6:00pm

So admittedly, STOKE was never intended for parsing, but for compiler optimization (e.g. as an alternative to running GCC -O3). But the MCMC algorithm used in STOKE can be driven by basically any cost function, so it seems to me that if you made the cost function something like edit distance, then you could potentially find all ASTs which render to the same string (of which there should hopefully be exactly one).

You’re right though that walking the space of ASTs is not so very different from doing a random walk over a CFG (or other grammar), so perhaps this doesn’t provide any net benefit. Most likely, I have muddled my thinking. I guess my impression was that:

People here seem to be pessimistic about the feasibility of writing a formal grammar
On the other hand, walking the space of ASTs seems not so hard by comparison
And at any rate, the spec itself is phrased constructively (in terms of how to build blocks out of inlines and other blocks, etc.), so such an algorithm would have a convenient correspondence to the spec

I suspect the sticking point would probably be making sure that the transformation rules satisfy all the requirements for MCMC (e.g. ergodicity), which is easy enough with straight-line assembly and not so obvious with an arbitrary AST.

I will go read the Okhotin paper to try grok the rest of your posts, but I am hopeful that you will be successful.