Type 3 HTML blocks

I came across an interesting issue with Type 3 HTML blocks. Consider this input:

> and the test:
>
> <?php
> ini_set("pcntl.async_signals", "1");

We use 2/3 vote for "a feature affecting the language itself".

According to spec version 0.25, that quoted PHP code gets interpreted as a Type 3 HTML block. As a user, I do not expect this behavior within a blockquote.

This behavior does not occur when quoting other scripting languages’ tags. For example, I believe that both ASP.NET and Ruby use <% which does not get interpreted in this way.

IMO PHP script tags should behave identically to other languages’ tags. Is it possible to narrow the scope of Type 3 tags somehow? For example, if the goal is to handle XML prologs, perhaps Type 3 can be revised to match <?xml instead of just the first two characters?

Angle brackets should be escaped, or within code blocks.

I would expect that to be expressed as

and the test:

<?php
ini_set("pcntl.async_signals", "1");
> and the test:
>
>     <?php
>     ini_set("pcntl.async_signals", "1");

(or fenced code block, your choice)

Code should be expressed as code, particularly code that contains HTML or HTML like entities.

While I do agree that it should ideally be expressed as code, I guess I’m wondering whether the current definition of Type 3 blocks is too loose regardless.

Babelmark 2 shows that 24% of Markdown parsers consider <?php to be HTML - all others escape the < as &lt;. This number drops to 10% if you exclude CommonMark-compliant parsers.

As an end user, I would not expect <?php to be interpreted as HTML because it is not HTML.

Is there a use case for matching <? outside of XML prologs? If not then I’d think that modifying Type 3 to match <?xml would produce results which are more consistent (and expected).

Hi! I don’t have much to add to the conversation except that as a user this was indeed very confusing, this is not the behavior I expected at all. I expected < to be escaped.

As an end user, I would not expect <?php to be interpreted as HTML because it is not HTML.

We don’t limit ourselves to HTML as strictly defined. You can enter DocBook tags if you like, for example; they will be passed through too, which is great if you have a DocBook renderer for CommonMark.

It would be a bad idea not to allow <?php...>, since this is certainly something one might want to pass on literally even if one is producing an HTML document. Indeed, custom processing instructions could be used for many purposes in a CommonMark tool chain.

As @codinghorror observes, the <?php.. part in your document is meant as a sample of code, so it should be treated the same way you’d treat other code, by putting it in a code span or code block.

1 Like

Technically, it is legal XHTML. php chose to make its syntax from XML processing instructions to help avoid too much confusion for basic HTML editors.