Start condition for type-1 HTML blocks

Not sure if I managed to miss something entirely, but I noticed an inconsistency between the spec description and the reference implementation for type-1 HTML blocks. Here’s what the spec (v0.29) says:

Start condition: line begins with the string <script , <pre , or <style (case-insensitive),
followed by whitespace, the string > , or the end of the line.

Because no other characters than whitespace, >, or the end of line can follow these tags, I expect that the following snippet does not satisfy the start condition of type-1 HTML blocks:

<pre trailing-chars

*foo*

</pre>

Since none of the lines satisfy any other start conditions for HTML blocks, everything should be treated as valid Markdown and rendered as such:

<p>
&lt;pre trailing-chars
</p>

<p>
<em>foo</em>
<p>

<p>
&lt;/pre&gt;
<p>

But the dingus still parses everything as an HTML block, and renders it as:

<pre trailing-chars

*foo*

</pre>

I believe the dingus interpretation, which allows for trailing characters, makes more sense here, since people might write something like:

<script defer
  type="text/javascript">

console.log("not *emphasis*");

</script>

To avoid collision with type-7 blocks, which are terminated by blank lines, we should require at least one whitespace between <script, <pre, or <style and its trailing characters, to keep these special tag names recognizable. For instance:

<scriptBlock>

This is a paragraph *with emphasis*.

</scriptBlock>

<script>

... but this one is *not*.

</script>

So the spec can say something like:

Start condition: line begins with the string <script , <pre , or <style (case-insensitive), followed by at least one whitespace character.

I think that the expected interpretation is that “a line that begins with <pre followed by whitespace should be a open a type 1 HTML block”. That is, “whitespace” is part of what a line needs to start with to form the start of that type of HTML block.

Note that I think this is also true for “the string >”: If you have the following line, it’ll also start a type 1 HTML block.

<pre> aaa

bbb

</pre>

The wording does feel like ambiguous, though, and it also feels kinda weird when it mentions “the end of the line”, which I’m not sure can be considered to be part of the line itself (which makes it weird to say that a line needs to start with a string containing it).

That’s right. The string <pre here is followed by whitespace (there is other stuff after the whitespace, but that does not change the fact that it is followed by whitespace).

1 Like

Ah I see, indeed “the end of the line” was throwing me off. Thanks all!