I was recently running pandoc
to convert some HTML, which contained something like this:
<ul>
<li>foo
<li><hr>
</ul>
Pandoc produced this:
- foo
-
* * * * *
This example has fairly little agreement on how to interpret it. Pandoc and several others interpret it the same way, and the same way as the original HTML that I had. Others interpret it at a setext heading followed by a code block, some others as a list with the second element empty, followed by a code block. I can see how those interpretations would each come about, based on precedence of different rules.
The commonmark 0.26.0 interpretation seems surprising to me; it interprets it as 2 separate lists, one with the second element empty, the other containing a single element containing a horizontal rule. I’m having a hard time figuring out how you could get this result; if the first list ended, I would expect the indented line containing asterisks to just be interpreted as a code block. If the first list did not end, I would expect the horizontal rule to be contained in the second element of the first list.
I think, according to my reading of the spec, that the pandoc interpretation is the correct one, but I wanted to make sure.
This example came about when trying to run curl http://words.steveklabnik.com/structure-literals-vs-constructors-in-rust | pandoc -f html -t markdown > /tmp/test.md; rustdoc --test /tmp/test.md
, in order to run some test cases extracted from a web page. rustdoc
was interpreting the horizontal rule as a code block, and thus producing a spurious test failure, but before filing a ticket against rustdoc
or its Markdown parser, I wanted to check to see what the interpretation of this line should be.