A problem with backtick code fences

release-1.0

#1

First, a bit of history. Way back before github even existed, Michel Fortin (of PHP Markdown Extra) and I discussed the idea of a “fenced” or “delimited” code block syntax. We eventually decided on tildes as delimiters, and tildes as code fences are still supported in pandoc, PHP Markdown Extra, and many implementations. However, backtick fences, which were originally introduced by github, are now much more popular. (CommonMark currently supports both.)

I believe that backtick fences were conceived by analogy with Python multi-line strings, which are introduced by """. The idea, roughly, is that if inline code quoting is done with `, then multiline code quoting should be done with ```.

But this is a false analogy. While Python (single-line) strings are always delimited by a single " or ' character, in Markdown any number of backticks can be used to open and close inline code. If your code contains a single backtick, you can use double backticks to quote it. If it contains a double backtick too, you can use triple backticks. And so on. (It’s a bit more complex than this: you can use single backticks to quote text containing a double backtick, as long as it doesn’t contain a single backtick; the rule, which I notice this forum’s software does not follow, is that strings of exactly N contiguous backticks can quote code not containing any strings of exactly N contiguous backticks.)

The idea is really kind of ingenious: it avoids any need to escape characters in the quoted content. However, backtick code fences interfere with this system. Their presence makes it impossible to use a sequence of three or more backticks as an inline code delimiter when (a) the inline code starts at the beginning of a line and (b) it does not end on the same line. I think this is an ugly rough edge. Consider how unexpected the difference between these examples is:

` hello
this is inline code
`

`` hello
this is inline code with a backtick `
``

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

If you have a paragraph beginning with inline code that contains sequences of backticks with lengths 1 and 2, and it doesn’t fit on one line, then you’re completely out of luck; there is no way to write this in CommonMark. Of course, you could avoid the hard break and put everything on one line, but that is ugly and doesn’t usually have to be done.

Solutions?

  1. Remove backtick code fences and just have tildes. This is just a stylistic difference, but I imagine the backtick style is so deeply entrenched that this would not be a popular option.

  2. Put more constraints on the info string (e.g. require curly braces, as pandoc does, if you have anything more than a single word).

  3. Others?

This was first brought to my attention by


Issues we MUST resolve before 1.0 release [8 remaining]
Issues we MUST resolve before 1.0 release [8 remaining]
#2

Personally, I never really liked the info string without attribute braces. As such, I’m in favour of (2).

However, I see the problem of backwards compatibility, but it shouldn’t be too disruptive if an info string of a single word is still allowed, i.e. the really bad case is:

``` hello world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

#3

I’d generally prefer option #1, as it is the cleanest and simplifies the rules.

However, given the widespread usage of backticks, this would likely confuse users. Then again, that’s probably worth it since we can simplify things.

Are there any rules regarding the breaking of backwards compatibility in cases like these? How does CommonMark generally handle those things?


#4

The way I see it, if your “inline code” contains newlines, then it’s not really inline anymore. On the other hand though, I understand the desire to use newlines for manual line wrapping to improve readability in tools which don’t support automatic line wrapping.

Maybe the simplest solution is to just require empty lines to separate code blocks from other text. Otherwise, consider it an inline code block.


#5

Actually, it looks like that behavior exists already.

```This `code` is line; some_statement(asdf, fwe, foo); and more stuff.... ``double backticks``;
this is on a new line
```. The preceding code is inline with this text.

Renders as:

```This code is line; some_statement(asdf, fwe, foo); and more stuff… double backticks;
this is on a new line

Honestly, I’d prefer it render as:

This `code` is line; some_statement(asdf, fwe, foo); and more stuff.... ``double backticks``; this is on a new line. The preceding code is inline with this text.

But that’s a separate issue from the one described here.

My question is, why would you want newlines in your inline code? If your code is inline, I don’t think it should contain newlines. Conversely, if your code needs newlines then it should probably be in a code block, not inline with other text.

Or in other words, the original post says:

But in reality that’s not true. There also needs to be no text following the inline code. And if there is no text following the inline code or preceding it, then why shouldn’t it render as a code block?


#6

I disagree that it is “just” a stylistic difference. Perhaps it is mostly a stylistic difference, but I feel that one of the things that makes Markdown so easy is its consistency. (Yes, that may be ironic for something that has so many varied implementations, but bear with me …) When I want to emphasize text, I surround it with *asterisks*. If I want more emphasis, I add more **asterisks** and still more ***asterisks***. When I want a header I start a line with a # hash. If I want more (sub-)headers in the same section, I just add ## more hashes. Basically, a theme is established by a single character or pair of surrounding characters … and variations on that theme (whatever makes sense for that theme) are created by adding more of that same character or more surrounding pairs. Maybe I’m not looking hard enough, but I don’t see any other prior art of a variation on a theme being the SHIFTed version of the original character. (And yes, I’m aware there are some international keyboards where the backtick and the tilde are not on the same key.)

Whereas we do have prior art of the \backslash being treated as an escape character, for instance when I want the hash to

# start a new line instead of creating a header.

So another option, one that I feel would increase consistency would be to use backslashes to escape backticks within inline code spans and leave triple backticks for code blocks.

Though I freely admit that I could just be being selfish because I find the triple backticks easier to type than triple tildes. And since they’re smaller characters in a proportional font they feel less intrusive. But I realize that’s just my personal preference.


#7

+++ Lee Dohm [Feb 19 15 17:15 ]:

So another option, one that I feel would increase consistency would be
to use backslashes to escape backticks within inline code spans and
leave triple backticks for code blocks.

You raise a good point about consistency, but this isn’t a good solution. It completely breaks backwards compatibility with original Markdown. And I think this is a nice feature of Markdown: you can just copy the code you want to quote verbatim, without having to go through and see what needs backslash-escaping.


#8

+++ Andrew Meyer [Feb 19 15 15:27 ]:

My question is, why would you want newlines in your inline code?

You might want to hard-wrap your text to a certain width. (Note that because this is really the use case, CommonMark treats the newlines as spaces here.)

But in reality that’s not true. There also needs to be no text
following the inline code.

If you look carefully at the rules for backtick code fences in the spec, you’ll see that this isn’t true. The code block, once opened, continues until a closing fence or end of document is reached. (In this way we avoid the need for potentially unlimited backtracking.)

And if there is no text following the inline
code or preceding it, then why shouldn’t it render as a code block?

One might want to end a paragraph with some quoted code, without putting it in a code block. And, if one hard-wraps, one might want to put a line break in that quoted code.


#9

+++ Andrew Meyer [Feb 19 15 13:57 ]:

Maybe the simplest solution is to just require empty lines to separate
code blocks from other text. Otherwise, consider it an inline code
block.

Yes, this is another solution that could be considered. However, I think it’s extremely common for people not to put in this extra blank line, and I know it’s convenient not to have to do it.


#10

Ah, sorry. I keep confusing Discourse’s implementation with the CommonMark spec. It’s really convenient to try out markdown in the Discourse live preview, and Discourse does implement it that way.

I am a bit curious about that bit on unlimited backtracking though. Can you give an example of a case where unlimited backtracking occurs with Discourse’s current implementation of Markdown?


#11

+++ Andrew Meyer [Feb 19 15 21:49 ]:

I am a bit curious about that bit on unlimited backtracking though. Can
you give an example of a case where unlimited backtracking occurs with
Discourse’s current implementation of Markdown?

If you have an open fence but no close fence, the parser would, I take it, have to parse to the end of the document and then backtrack.


#12

If that intent is inline code — semantically a single line — that was too long for one line, it’s rather weird to put the closing ``` on separate line. Like most people wouldn’t write:

**Emphasis
importantly empathic
**

This is what I would instinctively write:

``` hello
is this NOT inline code?
doesn't look to me to be a code block!```

and as long as I don’t leave a space before the closing backtick, even a hard-wrapping editor won’t accidentally put it at start of line.

This parses as inline code in most implementations, including Markdown.pl (because it had no concept of fenced blocks):
http://johnmacfarlane.net/babelmark2/?normalize=1&text=```+hello is+this+NOT+inline+code%3F doesn't+look+to+me+to+be+a+code+block!``` more+text

  • marked parses it as code block that ends at the trailing ```. That’s as gracefully degrading as I could ask.
  • unfortunately other implementations parsing as code block don’t recognize it as the end of the block, so following lines become part of the “runaway” code block: commonmark, cheapskate, Parsedown, cebe/markdown, markdown-it.
    This makes this style of writing very unportable :frowning: .
    That doesn’t necessarily preclude CommonMark from specifying that’s how this case should be parsed but it does suggest there should be another way to support it.

The CommonMark spec to me reads ambigous: it’s worded in a way that seems to assume a code fence must occur at start of line (perhaps indented) but doesn’t say it explicitly. And there are no examples with ``` following text.


#13

If you try your first example followed by another paragragh:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=```This+`code`+is+line%3B+some_statement(asdf%2C+fwe%2C+foo)%3B+and+more+stuff…+%60%60double+backticks%60%60%3B%0Athis+is+on+a+new+line%0A%60%60%60.+The+preceding+code+is+inline+with+this+text.%0A%0Anext+paragragh

it fails very confusingly in several implementations — including commonmark — with the first backticks not recognized as code at all and the following innocent paragraph turned into code block.
That’s because ```` This `code` is... isn’t a valid language.
Adding a space “fixes” that in all implementations except commonmark and markdown-it:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=```+This+`code`+is+line%3B+some_statement(asdf%2C+fwe%2C+foo)%3B+and+more+stuff…+%60%60double+backticks%60%60%3B%0Athis+is+on+a+new+line%0A%60%60%60.+The+preceding+code+is+inline+with+this+text.%0A%0Anext+paragragh
this I think is a clear bug — shouldn’t a line starting with 3 backticks parse either as inline code or code block?!

Even then a few implementations don’t recognize the closing backticks, so the following paragraghs become part of a runaway code block.


#14

This is what I would instinctively write:

``` hello
is this NOT inline code?
doesn't look to me to be a code block!```

Agreed. The example I gave where the triple backticks were wrapped to the following line was just to demonstrate an edge case. In reality, I’d probably write it that way too.


I guess my point was that there’s probably never going to be a case where you need to write “inline code” in a section by itself. If is isolated from other text, then there’s really no reason not to use a code block. So in the exact cases cited in the original post:

` hello
this is inline code
`

`` hello
this is inline code with a backtick `
``

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

Those should all be code blocks anyway.

The real problems start when you actually are using these inline in paragraphs. Assuming you’re using long code spans with hard-wrapped lines, in the worst case you could end up with something like this:

This is a paragraph with normal text, and below is
```a code span which happens to be wrapped such
that the beginning and end of it are on their own
lines and it contains ``backticks`` and other stuff```
Now the paragraph of normal text continues. This
should be normal text here.

Which IMO, renders quite well in Disqus (minus the fact the hard wraps are rendered as newlines, but that’s a separate issue, and not part of the markdown standard anyway):

This is a paragraph with normal text, and below is

that the beginning and end of it are on their own
lines and it contains ``backticks`` and other stuff```
Now the paragraph of normal text continues. This
should be normal text here.

Many other implementations handle this case well also. Some of the ones that don’t though treat all text following the opening backticks as one giant code block. This can be worked around from the user’s perspective by ensuring that the opening backticks aren’t at the start of a line.

When the paragraph begins with code, we get very similar behavior from each implementation.


Personally, I’m in favor of specifying that triple backticks (or whatever other delimiter is being used) not on its own line will end the code as a code span, whereas putting them on their own line will end the code as a code block. That’s consistent with the behavior of many existing implementations, and IMO works best from the users perspective.


#15

+++ cben [Feb 20 15 09:57 ]:

The CommonMark spec to me reads ambigous: it’s worded in a way that
seems to assume a code fence must occur at start of line (perhaps
indented) but doesn’t say it explicitly. And there are no examples with

Where is the ambiguity you see?

“A fenced code block begins with a code fence, indented no
more than three spaces.”

“The closing code fence may be indented up to three spaces,
and may be followed only by spaces, which are ignored.”

That seems pretty explicit to me.


#16

It’s a confusing example, but it does accord with the spec, so it’s not a bug. First, remember that as the spec states explicitly, markers of block-structure take precedence over markers of inline structure. The line

```. The preceding code is inline with this text.

meets the spec’s criterion for opening a fenced code block. (The first line does not, because info strings are not allowed to contain backticks.) Because this line opens a code block, it isn’t parsed as part of the preceding paragraph, and so its triple backticks can’t close the triple backticks in the preceding paragraph, just as they wouldn’t in this case:

``` some `code`

```

(Recall that fenced code blocks can interrupt paragraphs and don’t need a preceding blank line. Recall also that a fenced code block in commonmark does not need a closing fence; if no closing fence is encountered, it extends to the end of the document.)


#17

Sorry, by “bug” I meant “loose handwaving to the effect it should change, perhaps in the spec”.
Thanks for your clear explanation; I don’t see any simple way to change this case.
I should actually read the spec start-to-finish before further comments…


#18

I would personally vote for #1 as the backtick code fences brings an annoying issue for me and my french canadien fellows:

The ` character is a composing character for building è on french canadian keyboard in OS X

This means I always need to type space to break out of the composing state.


#19

+++ pothibo [Feb 25 15 16:07 ]:

[1]pothibo
February 25

I would personally vote for #1 as the backtick code fences brings an
annoying issue for me and my french canadien fellows:

The ` character is a composing character for building è on french
canadian keyboard in OS X

This means I always need to type space to break out of the composing
state.

The tilde style fences are already supported. So you can already use them instead of backtick fences, if that’s easier for you. Option #1 is not to support tilde fences (we have them already), but to make them the only option for fenced code.


#20

I know that tildes are already available, but by default, most sites show the back tick as the way to use block quote. I had to dig at first to figure out that I could replace it with tildes for the same effect.

I was just saying that backtick are somewhat broken with a french canadian layout as it initiates a compositing key.


Marking code blocks fragments bold / italic