Language attribute overspecified

Most of the spec concentrates on parsing the document, not rendering it. One notable exception in my opinion is the part about the language class for fenced code blocks. Why prefix “language-” and not something else, if that would agree better with existing css? How about language normalization, i.e. mapping “sh” to “shell” and so on? How about server-side code markup? I think what’s currently in the spec there should be formulated as a suggestion, but not a strict spec requirement. The spec should specify the extent and content of the info string, and the extent and content of the first word thereof, but leave rendering of these up to the renderer.

Yes, given that CommonMark is supposed to be a (lightweight, human-oriented) markup language, and that parsing the document is pretty much all that a markup language is about, this seems reasonable.

One notable exception in my opinion is the part about the language class for fenced code blocks. Why prefix “language-” and not something else, if that would agree better with existing css?

Good question. [ I had no part in writing the spec. ] And to your questions, I would add: why specify that the (modified) info string ends up as a class= attribute in a HTML <code> element? Not that this is a bad choice, but it is, IMO, none of the specification’s business.

I think what’s currently in the spec there should be formulated as a suggestion, but not a strict spec requirement. The spec should specify the extent and content of the info string, and the extent and content of the first word thereof, but leave rendering of these up to the renderer.

I absolutely agree, how this info string (and other similar information, like the number of the first item in an ordered list etc) is used in rendering or constructing an output document is outside the scope of the specification; but a “suggestion” or “recommendation” could help to curb arbitrary variation in practice.

However, the specification text is not that consistent about the relation to HTML rendering: At times talking about custom tags (and even, say, DocBook tags), or even explicitly saying in section 1.3 “About this document”:

Since this document describes how Markdown is to be parsed into an abstract syntax tree, it would have made sense to use an abstract representation of the syntax tree instead of HTML.

the “HTML bias” is actually there: the info string specification issue is an example for it.

Another example, a striking one IMO, is the requirement that every processor knows all the named character entities of [one specific version of] HTML and only those; while on the other hand requiring that non-HTML tags, that is custom element type names [called “tag names” in the spec], can be used in markup.

I agree…does anyone have any objections to reformulating this as a suggestion for renderers?

1 Like