A problem with backtick code fences

release-1.0

#15

+++ cben [Feb 20 15 09:57 ]:

The CommonMark spec to me reads ambigous: it’s worded in a way that
seems to assume a code fence must occur at start of line (perhaps
indented) but doesn’t say it explicitly. And there are no examples with

Where is the ambiguity you see?

“A fenced code block begins with a code fence, indented no
more than three spaces.”

“The closing code fence may be indented up to three spaces,
and may be followed only by spaces, which are ignored.”

That seems pretty explicit to me.


#16

It’s a confusing example, but it does accord with the spec, so it’s not a bug. First, remember that as the spec states explicitly, markers of block-structure take precedence over markers of inline structure. The line

```. The preceding code is inline with this text.

meets the spec’s criterion for opening a fenced code block. (The first line does not, because info strings are not allowed to contain backticks.) Because this line opens a code block, it isn’t parsed as part of the preceding paragraph, and so its triple backticks can’t close the triple backticks in the preceding paragraph, just as they wouldn’t in this case:

``` some `code`

```

(Recall that fenced code blocks can interrupt paragraphs and don’t need a preceding blank line. Recall also that a fenced code block in commonmark does not need a closing fence; if no closing fence is encountered, it extends to the end of the document.)


#17

Sorry, by “bug” I meant “loose handwaving to the effect it should change, perhaps in the spec”.
Thanks for your clear explanation; I don’t see any simple way to change this case.
I should actually read the spec start-to-finish before further comments…


#18

I would personally vote for #1 as the backtick code fences brings an annoying issue for me and my french canadien fellows:

The ` character is a composing character for building è on french canadian keyboard in OS X

This means I always need to type space to break out of the composing state.


#19

+++ pothibo [Feb 25 15 16:07 ]:

[1]pothibo
February 25

I would personally vote for #1 as the backtick code fences brings an
annoying issue for me and my french canadien fellows:

The ` character is a composing character for building è on french
canadian keyboard in OS X

This means I always need to type space to break out of the composing
state.

The tilde style fences are already supported. So you can already use them instead of backtick fences, if that’s easier for you. Option #1 is not to support tilde fences (we have them already), but to make them the only option for fenced code.


#20

I know that tildes are already available, but by default, most sites show the back tick as the way to use block quote. I had to dig at first to figure out that I could replace it with tildes for the same effect.

I was just saying that backtick are somewhat broken with a french canadian layout as it initiates a compositing key.


Marking code blocks fragments bold / italic
#21

If you look at the GitHub code fences, they specify language without a space:

```ruby
require 'redcarpet'
markdown = Redcarpet.new("Hello World!")
puts markdown.to_html
```

I think a multi-word restriction here, or even a space restriction is probably fine.

Option #1, removing backtick code fences, is just not an option.


#22

I agree that requiring the info string to follow immediately, without intervening space, is probably the best solution: if a code span is indeed intended, one can always insert a SPACE after the backtick string, and this SPACE will then be trimmed off the code span’s content anyway according to the current syntax rules.

And even if the backtick string of such a code span happens to end up at a begin-of-line, say through re-flowing lines, any text formatter would preserve or restore the following SPACE, so even in this case all is well.

And an opening code fence without an info string can’t occur “by accident” in a paragraph either, I’d say.

So apart from resolving an ambiguity, this rule would also make code spans “robust” in the face of re-flowing the lines of a paragraph, as far as I can tell. Which is nice!


#23

I’ve always used a space after the backticks, and every Markdown converter I’ve ever interacted with has accepted it. So, if we didn’t allow a space before the info spring, we’d risk breaking a lot of existing content. That’s a big strike against your proposal. And I don’t see how it really addresses the underlying problem. After all, a space isn’t required in inline code, so the ambiguity still needs to be addressed.


#24

I’ve always used a space after the backticks, and every Markdown converter I’ve ever interacted with has accepted it.

I have to admit that this went through my mind, but then I conveniently kept silent about the issue—which yet is there, you’re right.

And I don’t see how it really addresses the underlying problem.

I may be wrong, but assume the info string (if any) in an opening code fence were required to follow immediately after the backtick string. Now consider a case where you write a code span, but it then happens too look like a code fence, say:

Here is a *code span*, which starts with a *backtick string* and crosses lines:
```sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.

Because here “sample” is meant to be part of the code span, and is not meant to be an info string (by assumption!), simply inserting a SPACE would disambiguate (what a word!) the situation:

Here is a *code span*, which starts with a *backtick string* and crosses lines:
```␣sample 1 2 3 4 test
5 2 7 8
```
and here we're back in the paragraph.

Because of the SPACE (and the assumed syntax rule change), it can’t mean a code block any more, and because of the “trim one SPACE”-rule for code spans, the “meaning” or content of the—now unambiguous—code span hasn’t changed.

Does that make sense now?


#25

If there are multiple words after the opening code fence – with or without a space – then I would argue the author probably intended those words to be the first line of code, and not an info string at all.

```multiple words here
more content here
```

```␣multiple words here
more content here
```

A few options in that case:

  1. Be flexible and treat those multiple words as the first line of a code block.
  2. Be rigid and break the code block, turning the entire first line into literal text with three visible backticks. (GitHub does this.)
  3. Be selective and use only the first word on that line as the info string, discarding the rest of the line.
  4. Be graceful and preserve all the text between the two code fences, but have it degrade to inline code.

I would choose option 4, because of the principle of least surprise. (“Why did the entire first line of my code block disappear?” vs. “Looks my code is displaying inline. I’d better double-check my formatting.”)

It also solves the problem that began this thread:

…then you can put that code between triple-backticks, with the first 2+ words of code on the same line as the opening triple backticks.


TL;DR If the opening three-backtick code fence is followed by

  • Zero words, it starts a code block.
  • One word, it starts a code block and includes a one-word info string.
  • Two or more words, it starts an inline code span.

Then, since the info string is limited to one word, I think there should be no problem allowing a space before it.


#26

The info string isn’t currently limited to one word. This was the subject of some debate when we first started talking about this, so let me recapitulate it.

Pandoc’s fenced code blocks have always allowed specification of quite a bit of structured information:

``` {.class .class #id key=value key="value"}
code here
```

This is especially useful when the code blocks are postprocessed (e.g. by a pandoc “filter”). You might, for example, have a filter that takes specially marked code blocks and converts them to charts. And in that case you might want to have attributes like width, height, background-color…

Even for source code, you might want to specify whether to number lines, what highlighting style to use, and so on.

So limiting the info to a single word would really be too limiting.

Originally I proposed something like the pandoc format, but others didn’t want to get behind that, so we came up with a compromise that the spec wouldn’t specify the format of the info string or what was to be done with it. Perhaps this should be revisited.


#27

Revisiting this sentence. I agree “no way to write this” is a showstopper. So perhaps allow people to also use tildes in this particular scenario? In other words support both tilde and backtick code fences.

Does that solve it to your satisfaction?


#28

@codinghorror - You can already have tilde code fences for code blocks. This doesn’t help with the problem, which is how to express a certain kind of code span (not a code block). And even if we allowed tildes to be used for code spans (which would conflict with common extensions that use them for strikeout), we’d still have the problem.


#29

I see, I think I misunderstood the example:

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

~~~ hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
~~~

#30

Forbid the space before the info string if it contains more words/spaces:

```hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```.hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```␣hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because it only contains a single word after the backticks!
```

```␣hello␣
this IS inline code with one backtick ` and two backticks ` (?)
```

```␣hello␣world
this IS inline code with one backtick ` and two backticks `
```

```hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```.hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

```␣.hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no word but punctuation after the space!
```

#31

Maybe we’re overthinking this. Simpler rules are better, right?

How about:

  1. Triple backticks or triple tildes: Code fence.

  2. Opening fence begins a line, closing fence begins a new line: Code block.

  3. Opening fence begins a line, but closing fence does not begin a new line: Inline code.

  4. Text on the same line as the opening fence of a code block: Info string.

Any info string is thus allowed, even multiple words, even with a preceding space.

As for the issue that began this thread…

``` To include ` and `` backticks in inline code,
the closing fence should not be at the start of a new line,
but rather after code, like this. ``` And here's some non-code inline text.

This starts a paragraph with inline code, including single and double backticks.

It works in the current commonmark, markdown.pl, and most other flavors. (Babelmark 2 test, Babelmark 3 test.)

And, as pointed out by cben and Ajedi32, it follows the convention of *other* types of **delimiters** being written _inline_.


#32

@jkdev - your proposal does nothing to remove the ambiguities.

1 ``` code
2 foo
3 bar ```
4 ```

Is this a code block that ends on line 4? Or a code span that ends on line 3? You could resolve this in favor of the latter by saying that we close as soon as we can, but this breaks backwards compatibility for code blocks containing strings of backticks, and creates difficulties expressing code blocks containing backticks.

Not being able to identify a code block by the first line would also break a lot of very nice properties of our present parsers, which identify block structure first, inline structure later.

It may be that this issue is enough of a corner case that we shouldn’t obsess about it. The only real “blind spot” there is is for inline code that contains strings of two backticks and occurs at the beginning of a paragraph (otherwise you can reorganize it so it doesn’t start at the beginning of the line).

I suppose another solution would be to allow only one-word info strings with backtick code blocks, while allowing free-form info strings with tilde code blocks. I hesitate to do that, though, as it complicates the mental model. (Why can I do this with tildes but not backticks?)


#33

While it woulndn’t resolve the original problem,
I’m wondering if changing code block rules to close on triple (or however many) backticks anywhere in the line is feasible. In your example, it’d be a code block that ends at line 3.

Motivation for closing early:

  1. Current behavior is not essential: I never realized that I can safely include ``` in code blocks as long as they don’t start the line. But I can always use more backticks (````) to start/end the code block, which is a simple rule covering all cases.

  2. “Compatibility” with original markdown: backtick-fenced code block syntax degrades gracefully to inline code in tools that don’t understand fenced blocks [^1]. That’s a good property and IMO should be maximized. However, tools that think ``` starts inline code will stop on the first ``` anywhere.

Babelmark confirms about half implementations support fenced blocks and only stop on final start-of-line backticks, while half don’t understand fenced blocks and stop the inline code on line 3.

  • marked (0.2.6) is only one that supports fenced blocks AND stops early on line 3. It only does that for ``` at an end of the line — if text follows, it doesn’t close the block there.

  • AFAICT no tool follows @jkdev’s proposal of switching from block to inline when closing fence is mid-line.

[^1] I lied: it only degrades gracefully without empty lines — empty lines abort inline code but not fenced blocks.
That’s why deciding block structure first is important. Consider this paradox:

```
Is this inline code or code block?

Closing fence is not at start of line: ``` And here's some non-code inline text.

If the block/inline decision depends on lookahead to where closing backticks are, it can’t be either — inline shouldn’t cross the empty line and block shouldn’t have mid-line termination. I.e. you don’t know how far to look for termination before you found the termination…

Interoperability is especially critical for agreeing where code starts and ends

Code is fundamental like escaping — it suppresses markdown-significant constructs, so if you don’t agree about whether it’s code, you get cascading confusion…
Fenced blocks make it worse — disagreeing about just one top-level fence can catastrophically flip the meanings of everything till the end of the document!

  • That’s why any limits on info strings worry me. The simplest rule “3+ backticks/tildes followed by anything [without backticks] starts a block” is probably our best chance for agreement. (Ignoring, or treating as code, info strings you don’t understand is fine, as long as you still consider it a code block.)

  • If we agree it’s code but don’t agree block vs inline, that’s still rather good!
    IIUC the only exceptions are (1) empty lines (2) mid-line closing backticks.
    Fenced blocks without empty lines are probably a non-starter, but perhaps (2) could be harmonized?
    As noted above all but one implementations with fenced blocks disregard mid-line backticks. So there is no single best-compatibility answer here :frowning:

    • “Parse block structure before inline” principle is I think new to CommonMark? It’s more or less implicit in the language, but I think many existing parsers have more ad-hoc structure. Ad-hoc parsing favors a single, simple, rule for code termination, whether it’s block or inline. [I’m thinking here not only of full parsers but about approximations like editor syntax highlighting…]

#34

Revisiting this thread, I see only two possible solutions.

One is @Crissov’s, which is to require the info string to start immediately after the backticks (with no intervening space) if it contains more than one word. (If it is just one word, then a space is okay; we want this for backwards compatibility since many implementations allow a space.) That is:

```␣hello␣world
this IS inline code with one backtick ` and two backticks `
```

```hello␣world
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block, because there is no space after the backticks!
```

The second is to constrain the info string; instead of allowing it to be anything, we could limit to, say, a bracketed list of key/value pairs:

Example:

``` haskell {class="numberLines" id="mycodesample" startline="15"}
let x = x + 1 in x
```

One option would be to allow any pandoc-style attributes, e.g. {#id .class1 .class2 key="value" booleankey}.

I think I prefer the option of giving some structure to the info string to the option of forbidding the space when there’s more than a single word in the info string, since the latter makes the presence of a single space have a big effect (and only in some cases), which might be surprising.

But nobody on this thread has actually commented on the idea of giving more structure to the info string.