I really like Markdown. My understanding of its goal is to take plaintext that a human wrote and intended for other humans to be able to read directly and translate marking indicating emphasis/whatnot into HTML. User-provided text would never naturally have any HTML markup in it. So, the spec should have those pass through so that they are rendered as they appear in the plain text—they should be escaped.
One scenario I am thinking of is handling of stuff that looks like it could be HTML. Markdown loves to swallow constructs like Someone <else at whatever dot com>
in the name of HTML passthru. Instead, it should escape that <
so that the end reader sees it as a literal angle bracket. Another example might be a very simple math assertion with no spaces: 0.01<p
.
Instead of trying to make it so that commonmark can be used as a filter for any existing HTML document, it should be targeted at actual text. People here seem to be of the opinion that one could interlace plaintext and HTML blocks, run that through a commonmark system, and then have a document with rich text. I think this adds a lot of unnecessary complexity to commonmark. Supporting HTML passthru requires the <
character to be escaped in many situations which the human writing plain text could not anticipate. I would propose that instead of running a commonmark processor on the entire HTML document, the commonmark processor be run on the textContent of nodes in the document that opt into markdown using some proprietary means. If a document is already HTML, why would you be wanting to run markdown/commonmark on it in the first place?
Another reason I’d like this is to get rid of the “garbage in, garbage out” idea. When I have a plaintext document and pass it through markdown, I want to get “structurally valid XHTML (or HTML)” (Why did it take me so long to realize that markdown’s promise was never meant to be kept in the first place?). If we can just rip out support for HTML passthru, there no longer is a reason for commonmark to output invalid XML fragments. Outputting structurally valid XML fragments would mean no more dealing with tracking down an unclosed tag, etc.
I don’t have the need to have commonmark passthru HTML. Most use cases I can imagine, such as this very post, user comments in various places, etc., have no need for HTML passthru and would end up sanitizing such stuff out anyway. Sure, disabling passthrough would not make it safe by default because users could still specify link and image URIs, but it would be safer than it is now. And it could be guaranteed to output valid XML (or even XHTML?) fragments—no more need to go parsing something so ambiguous and quirky as HTML/SGML…
Sorry for the rant post.