JGM Cmark And Straight Quotes


#1

Good morning.

In Gruber’s original Markdown perl script, straight quotes were passed through to the output as-is. But under JGM’s cmark, straight quotes are being converted to the “"” entity.

I was just wondering if this is expected and why it is happening. I personally find the resulting HTML harder to read and I have yet to find a browser that doesn’t display the straight quotes correctly.


#2

The spec calls out characters being replaced by their proper entities. It doesn’t include the reasoning why.

My guess: markdown should output valid and well-formed HTML. Browsers have special rules in place to gracefully handle bad markup, but that doesn’t mean we should be using bad markup.

Why are you trying to read HTML raw? © isn’t any easier to read in source.


#3

A literal QUOTATION MARK in character data inside an element’s content is perfectly valid and well-formed HTML (W3C 4.01, ISO 15445, HTML5) as well as SGML (ISO 8879 reference concrete syntax) and XML (W3C).

On the contrary, it is not uncommon to map (via short references) literal QUOTATION MARK characters in input text to start- and end-tags of some “inline quotation” element, eg, the <Q> element of HTML. Obviously, the &quot; entity reference will not be mapped in this way: therefore, replacing all QUOTATION MARK characters with entity references in the output will effectively make use of this convention impossible.


#4

Thank you for the replies and the link to the spec.


#5

The cmark spec says nothing about converting quotation marks to HTML entities. In fact, that would make it impossible to render quotes in non-HTML formats. What you’re seeing is the cmark HTML renderer converting quotation marks. It’s easier to have a single code path to escape text and attribute content, but I agree that leaving quotes untouched outside of attributes would be nicer. Feel free to open an issue on Github.