As suggested in the topic First steps towards an AST it would help to have a defined abstract syntax tree (AST). I’d prefer JSON because it is better supported in any language (except XSLT) than XML. In addition, pandoc already provides a JSON structure (but based on Pandoc Markdown instead of CommonMark). Given a AST emitting CommonMark parser you can avoid implementation in another parser but just parse the AST in JSON, e.g. in Perl:
open my $fh, "pandoc -t json < file.md |"; $doc = decode_json(<$fh>);
By the way Aaron Swartz wrote about this issue in 2004:
I’m not too worried about calling out to a separate process — it’s pretty cheap most of the time. If it actually does become a bottleneck, you can pretty easily switch to a server-client model. Having Markdown in your favorite language, while nice, is probably not the best use of time.