Abstract Syntax Tree is more important than implementations

As suggested in the topic First steps towards an AST it would help to have a defined abstract syntax tree (AST). I’d prefer JSON because it is better supported in any language (except XSLT) than XML. In addition, pandoc already provides a JSON structure (but based on Pandoc Markdown instead of CommonMark). Given a AST emitting CommonMark parser you can avoid implementation in another parser but just parse the AST in JSON, e.g. in Perl:

open my $fh, "pandoc -t json < file.md |";
$doc = decode_json(<$fh>);

By the way Aaron Swartz wrote about this issue in 2004:

I’m not too worried about calling out to a separate process — it’s pretty cheap most of the time. If it actually does become a bottleneck, you can pretty easily switch to a server-client model. Having Markdown in your favorite language, while nice, is probably not the best use of time.

1 Like

Maybe I am out of the loop here, but doesn’t the current reference implementation (in C) allow you to output an AST with the --ast flag? Please see @jgm’s reply to CommonMark Formal Grammar: “it will give you a representation of the syntax tree”.


Yes, the implementation already provides an AST. But unless mentioned in the specification, this AST may change without violating the specification. I guess it’s safe to start creating CommonMark processors that just make use of the AST instead of wasting time with more CommonMark parsers. And if you insist writing a parser in your favorite language, the parser should better emit the AST instead of HTML.