Entity-ification of quotes and brackets missing from spec

All across the 0.29 spec, ", <, and > are converted to &quot;, &lt;, and &gt; respectively, when parsed as plain text. Autolinks is rife with examples. However this is not spelled out anywhere in the spec (is it just these three?). The closest the spec comes to specifying this HTML-entity-ification is in example #298 under Backslash escapes, where a careful observer sees that three characters are not just escaped by a backslash, but are replaced with HTML entity references.

I would expect this behavior to be spelled out in perhaps one of the following sections: Backslash escapes, Entity and numeric character references, Raw HTML, or Textual content (the most likely). But it is not spec’ed in any of those sections.

It is outside of the CommonMark specs because the specification describes how to parse markdown document. The conversion to the entities you mention is subject of HTML renderer because those characters have to be escaped to not break the output HTML.

If you output to a different format, then the particular renderer of your choice may need to do completely different escaping or other text transformations the particular format expects.

EDIT: The examples in the specification are full of it because for testing purposes, the HTML renderer is sort of expected as likely the most widely implemented (and very easy to implement) renderer.

2 Likes