Raw HTML blocks proposals -- comments wanted

Yes, this is true even on the hybrid proposal I suggest above, because iframe, like del, can be both “flow content” (block level) or “phrase content” (span level). On the other hand, the fact that iframe can be flowphrase content means that there are legitimate uses for an iframe inside a paragraph, and possibly even at the beginning of a paragraph. So, I think, we shouldn’t make it impossible for authors to write these things. The current proposal allows you to use an iframe either inside a paragraph (even at the beginning) or as an independent block. The author can choose, it’s just a matter of whether you put the opening tag on a line by itself.

In general, I’m against creating “blind spots” – things that are impossible to express in CommonMark.

Yes, and it has been solved in a variety of ad hoc ways (or in some cases not solved). The hope was that we could do better, and have a principled solution.

As @vitaly notes, HTML5 already allows custom tags, and there have been requests from users to deal with them better.

In addition, we might consider the possibility that someone would target, say, DocBook or another XML format. (Creating a DocBook renderer for cmark or commonmark.js would be quite easy.) In that case they might like the ability to use raw DocBook on occasion. The present proposal supports this kind of thing.

let’s provide something to mark verbatim areas so it can be used in a generic way.

+++ lu_zero [Mar 04 15 00:44 ]:

[1]lu_zero
March 4

let’s provide something to mark verbatim areas so it can be used in a
generic way.

By “verbatim areas” I assume you mean stuff that should pass through to the output format unchanged, rather than things that should be escaped and put into pre or code tags. Assuming that’s right, I’m really sympathetic. I think that this would have been a better design for Markdown – instead of allowing HTML to be plonked down anywhere, have some kind of explicit marker for it, at least for verbatim HTML blocks.

The problem is backwards compatibility. Unless we want to break with original Markdown in a more thoroughgoing way, we need to support HTML blocks without explicit markers.

1 Like

I understand it is a hard problem with no easy answers @jgm but the “must place block tag on a line by itself” rule smacks of the “use two spaces at the end of a line for a line break” decision in original Markdown – it is impossible to discover without knowing the secret in advance. I just worry that it is very hard to figure this out and we are adding more secret handshakes.

It is a tiny bit better in that, unlike spaces, the tag is at least not invisible in the raw markup, I suppose.

And, to be fair, all the options seem bad in different ways here so there is no way to pick a best solution; all have tradeoffs.

+++ codinghorror [Mar 04 15 05:48 ]:

I understand it is a hard problem with no easy answers @jgm but the “must place block tag on a line by itself” rule smacks of the “use two spaces at the end of a line for a line break” decision in original Markdown – it is impossible to discover without knowing the secret in advance. I just worry that it is very hard to figure this out and we are adding more secret handshakes.

With the hybrid proposal, this would only matter for those few tags that can occur either as “flow content” or as “phrase content.” With, say, a <div> tag you wouldn’t need a single tag on a line. Perhaps that just makes it more confusing, since you need to know which tags behave this way. But that is already the case in, e.g. PHP Markdown, Markdown.pl, and showdown, where <del> behaves differently if you put it on a line by itself. http://johnmacfarlane.net/babelmark2/?normalize=1&text=<del> hi <%2Fdel> <del>hi<%2Fdel>

1 Like

As long as it mirrors the current behaviors in some way, I am for it.

I just don’t want to add any more “mystery meat” than we currently have in Markdown by design™

What about namespaced attributes for markdown directives

<div md:ignore=true>
    this will *not* have markdown parsed
</div>
<div md:ignore=false>
    this **will** have markdown parsed
</div>

could turn into

<div>
    this will *not* have markdown parsed
</div>
<div>
    <p>this <strong>will</strong> have markdown parsed</p>
</div>

Doing something like that would allow the markdown spec provide reasonable defaults, but allow users to override elements on an as-needed basis.

+++ zzzzBov [Mar 13 15 19:31 ]:

[1]zzzzBov
March 13

 What about namespaced attributes for markdown directives
this will *not* have markdown parsed
this **will** have markdown parsed
 could turn into
this will *not* have markdown parsed

this will have markdown parsed

 Doing something like that would allow the markdown spec provide
 reasonable defaults, but allow users to override elements on an
 as-needed basis.

This is what some implementations (e.g. PHP Markdown Extra) currently do. (The default is not to interpret the contents of block-level elements as Markdown, but you can override this by setting the attribute markdown=1. There are some ugly complexities about whether the attribute should propagate down to child elements.)

It isn’t needed with the current CommonMark spec, since you can get the same distinction by doing either:

<div>
this will *not* have markdown parsed
</div>

or

<div>

this *will* have markdown parsed.

</div>
1 Like

Any hope to see progress on this topic?

+++ vitaly [May 30 15 00:52 ]:

Any hope to see progress on this topic?

Yes, this is a high priority for this summer. Soon, I hope.

2 Likes

Summary of the proposal I think is best, after discussion:

A partial HTML tag is any initial portion of a full HTML tag, as defined in the spec, split at a word boundary.

An HTML block tag is an HTML tag whose tag name is not in the following list: a abbr area audio b bdi bdo br button canvas cite code command datalist del dfn em embed i iframe img input ins kbd keygen label map mark math meter noscript object output progress q ruby s samp script select small span strong sub sup svg textarea time u var video wbr text. (These are tags that can be used in “phrasing content,” according to the HTML5 spec. Some of them can also be used for block-level content.)

An HTML block starts with either (a) any HTML tag or partial tag on a line by itself, or (b) a complete HTML block tag followed by other content on the same line.

If the opening tag name is <script, <style, <pre, or <!--, the HTML block includes lines up to and including the first line containing a matching end tag, or the end of the document, if no such line is encountered.

Otherwise the block includes lines up to (and not including) the first blank line, or to the end of the document, if no blank line is encountered.

Example:

<del>x</del>

becomes

<p><del>x</del></p>

Why? del can be phrasing content, so in order to start an HTML block it needs to be on a line by itself. So,

<del>
x
</del>

becomes

<del>
x
</del>

If we leave a blank line after the <del> and before and after </del>, the contents will be interpreted as Markdown:

<del>

x

</del>

becomes

<del>
<p>x</p>
</del>

When we have tags that can’t be phrasing content, we can include several on one line, which should help with backwards compatibility:

<table><tr><td>
x
</td></tr></table>

is parsed as a raw HTML block.

3 Likes

Can some custom tags be considered phrasing content? Since these tags are not in the white list, can they be used as part of a paragraph?

E.g.

<my-custom-element>My custom element text</my-custom-element> followed by other paragraph text.

Would this output

<p><my-custom-element>My custom element text</my-custom-element> followed by other paragraph text.</p>

or this?

<my-custom-element>My custom element text</my-custom-element> followed by other paragraph text.

@chrisalley - quite right, we should have a whitelist of tags that
can only be block-level (= all the defined tags minus the ones in
my list). That will allow custom tags to be used either block or
inline.

Block tags =
address
article
aside
base
basefont
blockquote
body
caption
center
col
colgroup
dd
details
dialog
dir
div
dl
dt
fieldset
figcaption
figure
footer
form
frame
frameset
h1
head
header
hr
html
legend
li
link
main
menu
menuitem
meta
nav
noframes
ol
optgroup
option
p
param
pre
section
source
title
summary
table
tbody
td
tfoot
th
thead
title
tr
track
ul

1 Like

Testing CommonMark on a bunch of Stack Exchange questions and answers, I’m seeing quite a few instances of this:

Foo bar

<hr>
Quux [plonk](http://example.org)

where the HTML tag (under both the current and the proposed rules) prevents the subsequent Markdown from being interpreted.

Can we exclude the void elements from ever opening an HTML block?

2 Likes

If we just excluded void elements from opening an HTML block, then we’d still get the wrong parse, since if <hr> is interpreted as inline HTML, we’ll get everything in a paragraph:

<p><hr> Quux etc.</p>

One solution would be: if the line starts with a void element, then treat just this one line as an HTML block.

But of course, this would break on

<hr><div>
My *html* block
</div>

I’m not terriby worried about this, because this doesn’t satisfy Gruber’s own official criterion for an HTML block, and the proposed fix would produce reasonable results here (the word “html” would be emphasized, but the tags would be okay.

Not very useful. That will work only if void element is not followed by another html tag - will not solve all possible cases. Partial solutions are evil.

IMHO, it’s more simple to patch such rare patterns in db.

Yes, that’s what I meant.

Everything about HTML tag handling in CommonMark (or Markdown in general) is a partial solution that does not solve all possible cases.

2 Likes

Spec solves problem how to allow HTML without need to create full weight html parser. I see no reason to make spec more complex than needed. Example with <hr> looks strange, because hr is already covered in md syntax. May be there are better examples or usage stat proofs, but i haven’t seen those.

@jgm do we have any pending problems with latest html proposal? I don’t see any:

  • handling empty lines in multiline comments - OK
  • html5 custom tags (with dashes) - seems OK

Would be nice to see this update in spec. In any case, it’s much better than existing.

@vitaly I’m planning to do this in the next few weeks.

1 Like