Fenced code blocks should add class to `code` rather than `pre`, matching the HTML best practice

The spec currently says:

An info string can be provided after the opening code fence. Opening and closing spaces will be stripped, and the first word is used here to populate the class attribute of the enclosing pre tag.

I propose to change this as follows:

An info string can be provided after the opening code fence. Opening and closing spaces will be stripped, and the first word, prefixed with language-, is used as the value for the class attribute of the code element within the enclosing pre element.

This means that

```javascript
var x = 42;
```

Will not be translated to:

<pre class="javascript"><code>var x = 42;</code></pre>

…anymore, but instead to:

<pre><code class="language-javascript">var x = 42;</code></pre>

Reason: this matches the convention set out in the HTML standard itself — look for the code example that looks something like this:

<pre><code class="language-javascript">
  …
</code></pre>

If that is not convincing enough, note that this convention is supported by many syntax highlighting scripts written in JavaScript, e.g. highlight.js and google-code-prettify. Requiring Markdown implementations to output code that is recognized by these scripts is a major win.

12 Likes

There is a similar discussion over at Ghost to use the info string in fenced code blocks to generate a class prefixed by language- as mentioned in the HTML5 code spec.

It would make a lot of sense to use what is already supported by many syntax highlighters.

Since everyone seems to agree, here’s the spec patch: https://github.com/jgm/stmd/pull/71

2 Likes

I agree but I’d just like to point out what potentially may be considered an edge case. If #272 gets included, then a ‘code’ block without any language declaration should probably not have a code tag (which pandoc markdown actually includes) and only a pre tag, because it’s assumed it’s not actually computer code, but just preformatted text in general.

So this would mean that:

``` {.class}
verbatim text
```

should output:

<pre class="class">verbatim text</pre>

instead of:

<pre><code class="class">verbatim text</code></pre>

But without the curly braces, it should of course be as normal (mathias’ proposal).

Is this the correct method of interpretation? Have I excluded another particular case where no language declaration might still be computer code/require a code tag?

I’d like to see a data-language attribute in addition to (or possibly instead of) the class.

I doubt anyone applies CSS rules based on it the language without running the contents through a highlighter/formatter/whatever first so a data attribute feels like a better place to store this information.

The HTML specification as well as existing syntax highlighting implementations all disagree with you.

Actually I’ve come to the conclusion that it might be easier to just use a standard <pre> tag in this case, since it’s a block element anyway. (So disregard that comment.)