A problem with backtick code fences

release-1.0

#11

+++ Andrew Meyer [Feb 19 15 21:49 ]:

I am a bit curious about that bit on unlimited backtracking though. Can
you give an example of a case where unlimited backtracking occurs with
Discourse’s current implementation of Markdown?

If you have an open fence but no close fence, the parser would, I take it, have to parse to the end of the document and then backtrack.


#12

If that intent is inline code — semantically a single line — that was too long for one line, it’s rather weird to put the closing ``` on separate line. Like most people wouldn’t write:

**Emphasis
importantly empathic
**

This is what I would instinctively write:

``` hello
is this NOT inline code?
doesn't look to me to be a code block!```

and as long as I don’t leave a space before the closing backtick, even a hard-wrapping editor won’t accidentally put it at start of line.

This parses as inline code in most implementations, including Markdown.pl (because it had no concept of fenced blocks):
http://johnmacfarlane.net/babelmark2/?normalize=1&text=```+hello is+this+NOT+inline+code%3F doesn't+look+to+me+to+be+a+code+block!``` more+text

  • marked parses it as code block that ends at the trailing ```. That’s as gracefully degrading as I could ask.
  • unfortunately other implementations parsing as code block don’t recognize it as the end of the block, so following lines become part of the “runaway” code block: commonmark, cheapskate, Parsedown, cebe/markdown, markdown-it.
    This makes this style of writing very unportable :frowning: .
    That doesn’t necessarily preclude CommonMark from specifying that’s how this case should be parsed but it does suggest there should be another way to support it.

The CommonMark spec to me reads ambigous: it’s worded in a way that seems to assume a code fence must occur at start of line (perhaps indented) but doesn’t say it explicitly. And there are no examples with ``` following text.


#13

If you try your first example followed by another paragragh:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=```This+`code`+is+line%3B+some_statement(asdf%2C+fwe%2C+foo)%3B+and+more+stuff…+%60%60double+backticks%60%60%3B%0Athis+is+on+a+new+line%0A%60%60%60.+The+preceding+code+is+inline+with+this+text.%0A%0Anext+paragragh

it fails very confusingly in several implementations — including commonmark — with the first backticks not recognized as code at all and the following innocent paragraph turned into code block.
That’s because ```` This `code` is... isn’t a valid language.
Adding a space “fixes” that in all implementations except commonmark and markdown-it:
http://johnmacfarlane.net/babelmark2/?normalize=1&text=```+This+`code`+is+line%3B+some_statement(asdf%2C+fwe%2C+foo)%3B+and+more+stuff…+%60%60double+backticks%60%60%3B%0Athis+is+on+a+new+line%0A%60%60%60.+The+preceding+code+is+inline+with+this+text.%0A%0Anext+paragragh
this I think is a clear bug — shouldn’t a line starting with 3 backticks parse either as inline code or code block?!

Even then a few implementations don’t recognize the closing backticks, so the following paragraghs become part of a runaway code block.


#14

This is what I would instinctively write:

``` hello
is this NOT inline code?
doesn't look to me to be a code block!```

Agreed. The example I gave where the triple backticks were wrapped to the following line was just to demonstrate an edge case. In reality, I’d probably write it that way too.


I guess my point was that there’s probably never going to be a case where you need to write “inline code” in a section by itself. If is isolated from other text, then there’s really no reason not to use a code block. So in the exact cases cited in the original post:

` hello
this is inline code
`

`` hello
this is inline code with a backtick `
``

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

Those should all be code blocks anyway.

The real problems start when you actually are using these inline in paragraphs. Assuming you’re using long code spans with hard-wrapped lines, in the worst case you could end up with something like this:

This is a paragraph with normal text, and below is
```a code span which happens to be wrapped such
that the beginning and end of it are on their own
lines and it contains ``backticks`` and other stuff```
Now the paragraph of normal text continues. This
should be normal text here.

Which IMO, renders quite well in Disqus (minus the fact the hard wraps are rendered as newlines, but that’s a separate issue, and not part of the markdown standard anyway):

This is a paragraph with normal text, and below is

that the beginning and end of it are on their own
lines and it contains ``backticks`` and other stuff```
Now the paragraph of normal text continues. This
should be normal text here.

Many other implementations handle this case well also. Some of the ones that don’t though treat all text following the opening backticks as one giant code block. This can be worked around from the user’s perspective by ensuring that the opening backticks aren’t at the start of a line.

When the paragraph begins with code, we get very similar behavior from each implementation.


Personally, I’m in favor of specifying that triple backticks (or whatever other delimiter is being used) not on its own line will end the code as a code span, whereas putting them on their own line will end the code as a code block. That’s consistent with the behavior of many existing implementations, and IMO works best from the users perspective.


#15

+++ cben [Feb 20 15 09:57 ]:

The CommonMark spec to me reads ambigous: it’s worded in a way that
seems to assume a code fence must occur at start of line (perhaps
indented) but doesn’t say it explicitly. And there are no examples with

Where is the ambiguity you see?

“A fenced code block begins with a code fence, indented no
more than three spaces.”

“The closing code fence may be indented up to three spaces,
and may be followed only by spaces, which are ignored.”

That seems pretty explicit to me.


#16

It’s a confusing example, but it does accord with the spec, so it’s not a bug. First, remember that as the spec states explicitly, markers of block-structure take precedence over markers of inline structure. The line

```. The preceding code is inline with this text.

meets the spec’s criterion for opening a fenced code block. (The first line does not, because info strings are not allowed to contain backticks.) Because this line opens a code block, it isn’t parsed as part of the preceding paragraph, and so its triple backticks can’t close the triple backticks in the preceding paragraph, just as they wouldn’t in this case:

``` some `code`

```

(Recall that fenced code blocks can interrupt paragraphs and don’t need a preceding blank line. Recall also that a fenced code block in commonmark does not need a closing fence; if no closing fence is encountered, it extends to the end of the document.)


#17

Sorry, by “bug” I meant “loose handwaving to the effect it should change, perhaps in the spec”.
Thanks for your clear explanation; I don’t see any simple way to change this case.
I should actually read the spec start-to-finish before further comments…


#18

I would personally vote for #1 as the backtick code fences brings an annoying issue for me and my french canadien fellows:

The ` character is a composing character for building è on french canadian keyboard in OS X

This means I always need to type space to break out of the composing state.


#19

+++ pothibo [Feb 25 15 16:07 ]:

[1]pothibo
February 25

I would personally vote for #1 as the backtick code fences brings an
annoying issue for me and my french canadien fellows:

The ` character is a composing character for building è on french
canadian keyboard in OS X

This means I always need to type space to break out of the composing
state.

The tilde style fences are already supported. So you can already use them instead of backtick fences, if that’s easier for you. Option #1 is not to support tilde fences (we have them already), but to make them the only option for fenced code.


#20

I know that tildes are already available, but by default, most sites show the back tick as the way to use block quote. I had to dig at first to figure out that I could replace it with tildes for the same effect.

I was just saying that backtick are somewhat broken with a french canadian layout as it initiates a compositing key.


Marking code blocks fragments bold / italic
#21

If you look at the GitHub code fences, they specify language without a space:

```ruby
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
```

I think a multi-word restriction here, or even a space restriction is probably fine.

Option #1, removing backtick code fences, is just not an option.


#22

I agree that requiring the info string to follow immediately, without intervening space, is probably the best solution: if a code span is indeed intended, one can always insert a SPACE after the backtick string, and this SPACE will then be trimmed off the code span’s content anyway according to the current syntax rules.

And even if the backtick string of such a code span happens to end up at a begin-of-line, say through re-flowing lines, any text formatter would preserve or restore the following SPACE, so even in this case all is well.

And an opening code fence without an info string can’t occur “by accident” in a paragraph either, I’d say.

So apart from resolving an ambiguity, this rule would also make code spans “robust” in the face of re-flowing the lines of a paragraph, as far as I can tell. Which is nice!


#23

I’ve always used a space after the backticks, and every Markdown converter I’ve ever interacted with has accepted it. So, if we didn’t allow a space before the info spring, we’d risk breaking a lot of existing content. That’s a big strike against your proposal. And I don’t see how it really addresses the underlying problem. After all, a space isn’t required in inline code, so the ambiguity still needs to be addressed.


#24

I’ve always used a space after the backticks, and every Markdown converter I’ve ever interacted with has accepted it.

I have to admit that this went through my mind, but then I conveniently kept silent about the issue—which yet is there, you’re right.

And I don’t see how it really addresses the underlying problem.

I may be wrong, but assume the info string (if any) in an opening code fence were required to follow immediately after the backtick string. Now consider a case where you write a code span, but it then happens too look like a code fence, say:

Here is a *code span*, which starts with a *backtick string* and crosses lines:
```sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.

Because here “sample” is meant to be part of the code span, and is not meant to be an info string (by assumption!), simply inserting a SPACE would disambiguate (what a word!) the situation:

Here is a *code span*, which starts with a *backtick string* and crosses lines:
```␣sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.

Because of the SPACE (and the assumed syntax rule change), it can’t mean a code block any more, and because of the “trim one SPACE”-rule for code spans, the “meaning” or content of the—now unambiguous—code span hasn’t changed.

Does that make sense now?


#25

If there are multiple words after the opening code fence – with or without a space – then I would argue the author probably intended those words to be the first line of code, and not an info string at all.

```multiple words here
more content here
```

```␣multiple words here
more content here
```

A few options in that case:

  1. Be flexible and treat those multiple words as the first line of a code block.
  2. Be rigid and break the code block, turning the entire first line into literal text with three visible backticks. (GitHub does this.)
  3. Be selective and use only the first word on that line as the info string, discarding the rest of the line.
  4. Be graceful and preserve all the text between the two code fences, but have it degrade to inline code.

I would choose option 4, because of the principle of least surprise. (“Why did the entire first line of my code block disappear?” vs. “Looks my code is displaying inline. I’d better double-check my formatting.”)

It also solves the problem that began this thread:

…then you can put that code between triple-backticks, with the first 2+ words of code on the same line as the opening triple backticks.


TL;DR If the opening three-backtick code fence is followed by

  • Zero words, it starts a code block.
  • One word, it starts a code block and includes a one-word info string.
  • Two or more words, it starts an inline code span.

Then, since the info string is limited to one word, I think there should be no problem allowing a space before it.


#26

The info string isn’t currently limited to one word. This was the subject of some debate when we first started talking about this, so let me recapitulate it.

Pandoc’s fenced code blocks have always allowed specification of quite a bit of structured information:

``` {.class .class #id key=value key="value"}
code here
```

This is especially useful when the code blocks are postprocessed (e.g. by a pandoc “filter”). You might, for example, have a filter that takes specially marked code blocks and converts them to charts. And in that case you might want to have attributes like width, height, background-color…

Even for source code, you might want to specify whether to number lines, what highlighting style to use, and so on.

So limiting the info to a single word would really be too limiting.

Originally I proposed something like the pandoc format, but others didn’t want to get behind that, so we came up with a compromise that the spec wouldn’t specify the format of the info string or what was to be done with it. Perhaps this should be revisited.


#27

Revisiting this sentence. I agree “no way to write this” is a showstopper. So perhaps allow people to also use tildes in this particular scenario? In other words support both tilde and backtick code fences.

Does that solve it to your satisfaction?


#28

@codinghorror - You can already have tilde code fences for code blocks. This doesn’t help with the problem, which is how to express a certain kind of code span (not a code block). And even if we allowed tildes to be used for code spans (which would conflict with common extensions that use them for strikeout), we’d still have the problem.


#29

I see, I think I misunderstood the example:

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

~~~ hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
~~~

#30

Forbid the space before the info string if it contains more words/spaces:

```hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```.hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```␣hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because it only contains a single word after the backticks!
```

```␣hello␣
this IS inline code with one backtick ` and two backticks ` (?)
```

```␣hello␣world
this IS inline code with one backtick ` and two backticks `
```

```hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```.hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```␣.hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no word but punctuation after the space!
```