Two html comments on same line breaks if second comment is multiline -- second close tag is escaped

ruffin · March 31, 2018, 8:14pm

If I try this:

test
<!-- hi --><!--
comment
-->
see me?

This is what’s rendered:

<p>test</p>
<!-- hi --><!--
<p>comment
--&gt;
see me?</p>

That second html comment ending tag should not have been escaped.

I would’ve expected this:

<p>test</p>
<!-- hi --><!--
comment
-->
<p>see me?</p>

There are many variations on theme here, but the bottom line seems to be having a multiline comment in the second comment borks things.

jgm · April 1, 2018, 12:04am

This behavior, while unexpected, does accord with the spec.

The reason is that both the start and the end condition
for type 2 HTML blocks are satsified by the line with ‘hi’:

Start condition: line begins with the string <!--.
End condition: line contains the string -->.

Furthermore, --> can’t start an HTML block.

If you want to suggest concrete (and hopefully minimal)
revisions to the spec here, feel free. Note, however,
that the spec is designed so that block structure can
be discerned without indefinite lookahead, and also to
allow the writer flexibility in including commonmark
content inside HTML tags. Because of these goals,
there are inevitably going to be some odd results
here and there, but they can all be worked around
once you understand the rules.

ruffin · April 1, 2018, 2:05am

Sure. I think it’s as easy as adding a few sentences after this portion in section 4.6 to introduce the concept of “compound HTML blocks”:

If the first line meets both the start condition and the end condition, the block may contain just that line.

An end condition may be followed on the same line by optional non-end of line whitespace and then a subsequent start condition for any HTML block type. If so, the HTML block will be continued as a “compound HTML block” until a proper end condition for the new block start condition is met, itself in turn able to be compounded by a start condition trailing its end condition as described above. The extended compound block can then be treated as a single block of raw HTML.

Any single line may have up to one leading end condition iff an HTML block is ongoing and zero to one trailing start conditions. Any single line may also have mutliple start and end condition pairs between its zero to one leading end conditions and zero to one trailing start conditions. Each start/end condition pair will also be treated as part of the compound HTML block.

Your lookahead here remains limited to the end of the line, which isn’t, by one measure, any worse than what you were doing before.

I’ll take another look at the text for precision (you might argue, eg, that, as worded, it does not deal cleanly with the blank line end condition for types 6 & 7), but I think that covers the gist of what would be needed. It is not especially heinous for parsers, and preserves the desire that “For any markup that is not covered by Markdown’s syntax, you simply use HTML itself.” (Luckily, John didn’t mention HTML comments in his list of block types.)