List Block and Html Block Interaction Help

So, full disclosure, I did find a bug in my parser, where I wasn’t handling this case properly:

```
- some text
some other text
```

and I was starting a list. Fixed that, but along the way, I have hit an interesting issue:

- <script>
- some text
some other text
</script>

So, based on that input, the Dingus for the CommonMark reference implementation gives:

<ul>
<li>
<script>
</li>
<li>some text
some other text</li>
</ul>
</script>

and my implementation currently gives:

<ul>
<li>
<script>
</li>
<li>some text
some other text
</script></li>
</ul>

Now, granted, the difference is small, but as I am trying to be on point with the reference definition, any change is important to me. In this case, my implementation gets to the last line and considers that line at first to be a HTML Type 7 block. But because Type 7 blocks cannot continue a paragraph and one is already in progress, it makes itself ineligible, effectively becoming normal text. Then, during the inline phase of my processing, it becomes a Raw HTML block, but inside of the paragraph.

Looking at the output for the reference implementation, my guess is that something is happening that evaluates to a non-paragraph state, causing the list to be terminated, and then for that same text (now that the paragraph is closed) to become a Type 7 HTML block. But I am not sure.

Can I get some assistance on figuring this out?

Huh, actually I’d guess no one really tried/tested/cared about this so far, and it’s quite possible many implementations (including the reference one) are buggy or behave simply crazily in respect to this.

It seems to me as a kind of loophole in the specification and we’re simply in a gray unspecified area here.

As a side note, just tested with my MD4C, and it generates even more nonsensical output so will need to take a look at it too and fix it to at least not generate complete nonsense.

The specification aside, intuitively, I would expect this output:

<ul>
<li><script>
- some text
some other text
</script>
</li>
</ul>

But thinking more about it, I’m even more perplexed by this example:

> <script>
> foo bar
baz
</script>

I surely know what I would expect from

> <script>
> foo bar
> baz
> </script>

(simple script inside a block quote)

or also from

<script>
> foo bar
> baz
</script>

(no blockquote, the > marks are just part of the script)

but the mixed case above makes me just lost.

Forget all the previous post. The specification does not offer many examples to test it but is says explicitly:

If there is no matching end tag, the block will end at the end of the document (or the enclosing block quote or list item).

In this light, the output given by dingus and cmark your implementation seems correct to me.

EDITED: Imho, your implementation, not the reference ones after more checking. Sorry. It’s not my good day today. Seems like a bug in cmark to me at the moment.

Confused… @mitty

Given

- <script>
- some text
some other text
</script>

and the reference output:

<ul>
<li>
<script>
</li>
<li>some text
some other text</li>
</ul>
</script>
  • line1: list block and html block start, generating <ul>\n<li>\n<script>
  • line 2: list item changes, ending the HTML block, generating </li>\n<li>some text
  • line 3: paragraph continuation, generating: \nsome other text

So, for me, the real question is what happens on line 4. At the time line 4 starts, there is an active list block and an active paragraph.

Option 1: (my implementation) The text on line 4 is tested for ability as an HTML block, and fails, as type 7s cannot break an existing paragraph. It then becomes normal text and inlines as Raw HTML. Then the end of document is hit, and both the paragraph and the list block close. Generates: \n</script></li>\n</ul>

Option 2: The text on line 4 causes the paragraph to close. This triggers the list to close as well as paragraph continuation is no longer in place. Then the processing of line 4 as a type 7 HTML block can proceed as it is not breaking a paragraph.

Option 3: The text on line 4 causes the list to close. This forces the paragraph closed. Then the processing of line 4 as a type 7 HTML block can proceed as it is not breaking a paragraph.

I am not sure if those are the only 3 options, but they are the only ones I can think of.

The reason this level of clarity is important to me is that to fix this issue properly, I feel that I need to understand why either Option 2 or Option 3 (or Option N where N > 3) is being used to form a correct response.

Just trying to understand, thanks for your patience!

I corrected myself later in my post. Sorry for the confusion.
I think your parser is right here.

I’ve updated cmark and commonmark.js to give the right results
here. Thanks for noticing the issue and submitting the bug
report!

1 Like