Can AST -> Markdown -> AST round-trip always reproduce the original?

I’m considering the use of Markdown as a serialization format for my note-taking app. It’s doable to convert the abstract syntax tree (AST) of the editor (Slate) to Markdown. It’s straightforward to take the AST produced by the Commonmark parser and convert it to a Slate AST.

The question is - For all ASTs, is there an algorithm that will transform them to Markdown, which, when run through the Commonmark reference implementation, will produce an AST identical to the original AST?

Doug Reeder via CommonMark Discussion
noreply@talk.commonmark.org writes:

The question is - For all ASTs, is there an algorithm that will transform them to Markdown, which, when run through the Commonmark reference implementation, will produce an AST identical to the original AST?

For all ASTS…certainly not. (If the AST doesn’t make enough
distinctions, then it certainly won’t be possible.)

For some ASTs? Yes, if the AST can represent all the
distinctions that matter for commonmark, then it should in
principle be possible to do a round trip. However, in practice
it’s not easy, because of the context-sensitivity of markdown
syntax. cmark’s commonmark renderer probably comes pretty close,
but I haven’t done extensive tests of round-trippability, and
there are a number of issues that I know about:

Thanks, yes, I left out the clause “For all ASTs that could be generated from Markdown…”.

My own tests suggest that you can usually round trip it, but it would be difficult to have confidence that all the “reasonable” cases are covered.

I’ll probably go back to serializing as a enforced subset of HTML. Is there a better serialization format for these ASTs?

I’ll probably go back to serializing as a enforced subset of HTML. Is there a better serialization format for these ASTs?

cmark will generate an XML representation of the AST.
cmark -t xml

1 Like