Possible issue with spec definition of HTML tags?

Hi, I’m a bit confused why the following text is being interpreted as an HTML block and not a paragraph.

I’m starting a line with </script>, which should not be considered the start of any of the 7 HTML blocks, yet it looks like it’s being interpreted as type 6 (or 7) by both the JS implementation and cmark.

Am I interpreting something wrong? My understanding is:

  • This is not a type 1 block because a closing tag cannot start this type of block.
  • This is not a type 6 block because script is not listed.
  • This is not a type 7 block because script is specifically excluded.

Is my understanding correct?

You’re right. The spec was written on the assumption that </script> (similarly </style> and </pre> would only occur after a matching opening tag. It’s a good question how this should be treated if it occurs by itself at the beginning of a line, but this may be a bit of a “don’t care.” Why did this come up in your case, and how do you think the tag should be handled?

I’m writing an implementation and I was augmenting the spec tests with something that made sure I was excluding </script> from matching as type 7 (a good test to have in the spec I think if it’s being called out explicitly).

The way I’m handling it is just to consider it paragraph text. I’d be happy to make a PR with a spec test for this if it’s alright with you.

Thinking about this a little more, I think it would stand to reason that closing tags shouldn’t be allowed to start an HTML block. What is the reasoning behind that?

Thinking about this a little more, I think it would stand to reason that closing tags shouldn’t be allowed to start an HTML block. What is the reasoning behind that?

The reasoning is that we want to allow opening and closing HTML tags around a section that is to be interpreted as Markdown. E.g.

<div class="foo">

*This is Markdown.*

</div>

Here we want the first and last lines to be interpreted as raw HTML blocks, and the middle as a Markdown paragraph.

Ah OK, fair enough. But as far as </script>, </style>, and </pre> go – would it be reasonable to interpret them as text rather than type 6/7 then?

Also, technically it’s possible to do something like </table/> and have it match as a type 6 block. I realize that Markdown is “loose” but shouldn’t something like that match as text instead of HTML?

I think </table/> is a “don’t care.” I don’t think it hurts too much to treat it as raw HTML; as things stand, there are already no guarantees that raw HTML blocks will be valid HTML.