It would be great if we could create an AST for Markdown. That way, parsers could parse the Markdown to this AST then generate the HTML from the AST at the end. This separation of concerns (parsing MD vs outputting HTML) should lead to better debugging, inspection, and testing of markdown parsers.
So I decided to create an AST in XML. I spent about 4 hours on it.
Despite the fact that XML is subjectively the devil, it has pretty great support for validation, and that is why I chose to write a XML DTD that describes an XML document containing the AST, which should have all information that should be necessary to generate the HTML output of the parser.
You should also know that I recognize that XML is incredibly verbose, and I do not actually expect parsers to make an XML file - just to have an internal representation that they can readily dump as this kind of XML file.
I submit this document under the CC-BY license, in addition to the CC-BY-NC-SA license implicit in content posted on the forum.
Here’s an example of use:
EDIT: All these links are dead. Didn’t notice that they get removed after 30 days.
test1.xml: http://hastebin.com/veyenuketo.xml <-- abstract syntax tree
(by the way, that test1.md is pretty funny on Babelmark)
test2.md: the empty string
test2.html: the empty string
Ideally, we could get a tool that would take this AST and render it as output HTML. It doesn’t seem like that should be too hard - link references would be the only part requiring anything other than straight load-in, dump-out.
The next step would be to create a tool that would take the AST and turn it into Markdown. Then we can compare the HTML output of the parsers and the AST->HTML tool and find bugs!