The AST output from Dingus is invalid Xml according to xmllint

When running the Xml AST output through xmllint, a validation error is thrown:

element document: validity error : root and DTD name do not match 'document' and 'CommonMark'

The offending line seems to be the DTD declaration in conjunction with the document’s root element:

<!DOCTYPE CommonMark SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">

Evidently, the DTD name CommonMark does not match the name of the root element <document>.

The respective rule is given in the W3C Xml specification as

The Name in the document type declaration MUST match the element type of the root element.

Is that known, and are any changes (in particular, changes to CommonMark.dtd, such as a renaming of the root element according to the DTD) planned for this?

The reason for my question is that in a project I am working on, ASTs are stored as Xml files at some point, and I’d like to make sure these Xml files are valid while also sticking to the conventions laid down by official CommonMark materials.

No, that rule wasn’t known (by me anyway).

Which would be better, calling the root element CommonMark or changing the file to document.dtd?

Calling the root element CommonMark gets my vote. It’s more consistent with HTML using <html> and SVG using <svg>. It’s also unambiguous.

Sorry, I think I was confused. This doesn’t concern the filename of the DTD, it concerns the name in the DOCTYPE declaration, i.e. the first word after <!DOCTYPE.

In your example, this is CommonMark, but in cmark’s xml output, it is document (which indeed matches the name of the root element).

Ah, I see. In commonmark.js, CommonMark is used instead of document. Well, that’s easily fixed (done in commit d171379431e958c45481f7df8420104d14650691)>

1 Like
  • Changing the root element to CommonMark or, better, commonmark.
  • Removing the version from the namespace IRI and adding a ‘version’ attribute to the document node, instead (see below). This is considered a good namespacing practice, because, rather than updating the namespace to a new revision, changing parts of the namespace string absolutely changes the namespace, effectively making it a new namespace. That will break all clients. Namespaces should only be changed, if one wants to enforce destruction of backwards compatibility.
  • Naming the official namespace prefix cm for commonmark, so to keep md for markdown free.
<cm:commonmark xmlns:cm="http://commonmark.org/ns/commonmark" version="1.0">
[...]
</cm:commonmark>

Remember: Once version 1.0 is out, changing such elementary things will be nearly impossible.