Drawing a distinction between HTML block elements non-element tags

The CommonMark spec has a good definition of HTML block tag in section 4.6. This includes a specific list of tags considered to be blocks, which is great. The list of block tags is critical in CommonMark because it takes a blank line to end an HTML block. For non-block HTML tags, we don’t want HTML block processing, so that for example, the 2nd and 3rd lines below are treated as CommonMark list items, and not part of the HTML.

<u>Underlined Heading</u>
* List Item
* List Item

The way CommonMark handles the above is pure goodness. But in the next paragraph the spec lumps this well thought out concept with four other non-element tags: HTML Coments, declarations, processing instructions, and CDATA Sections.

While the angle brackets make these all look visually similar, I wonder if we shouldn’t do a better job on preventing the HTML block mechanism kicking in inappropriately for these other SGML-inspired non-element tags.

For example, let’s start with comments:

<!-- I'm a comment, short and sweet. -->
# Introduction #

I propose the right HTML output for this should be

<!-- I'm a comment, short and sweet. -->
<h1>Introduction</h1>

But instead, the spec currently considers the Introduction heading to be part of an HTML block, and fails to convert it to a heading, producing exactly what was input:

<!-- I'm a comment, short and sweet. -->
# Introduction #

This same problem occurs for declarations, processing instructions, and CDATA sections, unless a blank line follows them. But that seems unnecessary because an important distinction between block HTML and these earlier SGML notations is easy while finding the end of an HTML block might be difficult.

I don’t remember ever seeing a need for nesting these non-HTML element tags, so I so I think we could just define them with the following terminators, and allow use of backslash escaping to resolve any conflicts.

`-->` for a comment.
`>`   for a declaration
`?>`  for a processing instruction
`]]>` for CDATA

Making this work right would be helpful in introduction of a <!CommonMark> declaration.

2 Likes

Another counter-intuitive case:

<!-- comment --> What about **this**.

geneates HTML containing stars, while

What <!-- comment --> about **this**.

generates emphasis on the word this.

Any thoughts about how to deal with the former?

1 Like

Totally agree, just got surprised by such a snippet:

<!-- 2.1 -->
### Communications Director
The Director of Communications is responsible for public communications,
online and offline.

Just for fun, I copy-pasted this exact markup right here, and added blockquote marks:

Communications Director

The Director of Communications is responsible for public communications,
online and offline.

(I use the comments for Vim-diff to keep track of position across translated documents)