The original Markdown was designed as a sort-of superset of HTML:
Markdown is not a replacement for HTML, or even close to it. Its syntax is very small, corresponding only to a very small subset of HTML tags. The idea is not to create a syntax that makes it easier to insert HTML tags. In my opinion, HTML tags are already easy to insert. The idea for Markdown is to make it easy to read, write, and edit prose.
For any markup that is not covered by Markdown’s syntax, you simply use HTML itself.
This makes easy things easy, and hard things possible.
However, Markdown.pl makes no attempt to parse Markdown inside of HTML, so if you want to create a mildly complicated container for some prose, you’re out of luck. But it doesn’t break the HTML.
To extend this, many markdown parsers, and the CommonMark spec, still treat the body of HTML blocks as markdown. But all of them break in some way when the HTML is indented, so none of these parsers are very useful as an HTML superset. Some examples:
<div class="foo">
<div class="bar">Bar body</div>
<div class="baz">Baz body</div>
</div>
The baz div is parsed as a code block:
<div class="foo">
<div class="bar">Bar body</div>
<pre><code><div class="baz">Baz body</div>
</code></pre>
</div>
Case where we’re trying to actually use markdown inside inline HTML:
<div class="foo">
<div class="bar">
<div class="baz">
<input type="text" name="quux" length="20"
required pattern="(?:\d\.\s)?\w+">
First para.
- List item
- List item
Second para with *emph*.
</div>
</div>
</div>
This is the only input I found that works on even a single markdown parser, md4c. It requires:
- Inner markdown body not indented at all
- Blank line between first para and HTML
- No blank line between second para and HTML
If there is a blank line between second para and the HTML, then the HTML gets parsed as an indented code block and breaks the HTML. If there isn’t (as above) then most CommonMark parsers still produce broken HTML:
<div class="foo">
<div class="bar">
<div class="baz">
<input type="text" name="quux" length="20" required pattern="(?:\d\.\s)?\w+">
<p>First para.</p>
<ul>
<li>List item</li>
<li>List item</li>
</ul>
<p>Second para with <em>emph</em>.</div>
</div>
</p>
</div>
I gather from reading the spec why this behaviour occurs (4.6 end condition for HTML blocks 6-7 is a blank line; the closing tags are just parsed as inline HTML of the second para; the tags with no blank lines between them are just considered one big “HTML block” chunk, since there’s no blank line to end them), so I’m unclear what the intended way of actually writing inline HTML is in CommonMark, other than simply using no indentation at all. It seems to me like the answer is simply “turn off parsing markdown inside HTML”, which feels unsatisfactory.
Is it possible for the spec to say “if you’re inside inline HTML, a line starting with spaces followed by an HTML tag is more inline HTML before it’s a code block”? It’s the only thing causing unavoidable breakage, and it’s still possible with this rule to use fenced code blocks for the case where one wants a code block inside inline HTML.