What changed in CommonMark?

codinghorror · September 1, 2014, 7:14am

In general, as little as possible!

The goal of CommonMark is not to redefine what Markdown is, or change the syntax, but make it parseable and predictable.

We did try to address some minor persistent Markdown complaints we’ve seen across the millions of users in GitHub, Stack Exchange, Reddit, Discourse, pandoc, and babelmark. John MacFarlane documents them here:

In a few cases, I have departed slightly from the canonical syntax description, in ways that I think further the goals of markdown as stated in that description.

There are only a few places where this spec says things that are contradict the canonical syntax description:

It allows all puncutation symbols to be backslash-escaped, not just the symbols with special meanings in markdown. I found that it was just too hard to remember which symbols could be escaped.
It introduces an alternative syntax for hard line breaks, a backslash at the end of the line, supplementing the two-spaces-at-the-end-of-line rule. This is motivated by persistent complaints about the “invisible” nature of the two-space rule.
Link syntax has been made a bit more predictable (in a backwards-compatible way). For example, Markdown.pl allows single quotes around a title in inline links, but not in reference links. This kind of difference is really hard for users to remember, so the spec allows single quotes in both contexts.
The rule for HTML blocks differs, though in most real cases it shouldn’t make a difference. (See here for details.) The spec’s proposal makes it easy to include markdown inside HTML block-level tags, if you want to, but also allows you to exclude this. It is also makes parsing much easier, avoiding expensive backtracking.

It does not collapse adjacent bird-track blocks into a single blockquote:

  > this is two
  
  > blockquotes
  
  > this is a single
  >
  > blockquote with two paragraphs

Rules for content in lists differ in a few respects, though (as with HTML blocks), most lists in existing documents should render as intended. There is some discussion of the choice points and differences here. I think that the spec’s proposal does better than any existing implementation in rendering lists the way a human writer or reader would intuitively understand them. (I could give numerous examples of perfectly natural looking lists that nearly every existing implementation flubs up.)
The spec stipulates that two blank lines break out of all list contexts. This is an attempt to deal with issues that often come up when someone wants to have two adjacent lists, or a list followed by an indented code block.
Changing bullet characters, or changing from bullets to numbers or vice versa, starts a new list. I think that is almost always going to be the writer’s intent.
The number that begins an ordered list item may be followed by either . or ). Changing the delimiter style starts a new list.
The start number of an ordered list is significant.
Fenced code blocks are supported, delimited by either backticks (```) or tildes (~~~).

stuartpb · September 4, 2014, 8:51am

I disagree with most of these changes. I feel the goal of Standard Markdown should be to specify the smallest agreed-upon subset of Markdown features coming from the original implementation, and that means not introducing features or changing semantics that better serve your own personal tastes.

Markdown almost always does need a number of extensions to better serve a particular scenario’s needs or tastes, and it is here that a standard base would be good: providing a clear basis in which to clearly and reproducibly specify where these extensions are made.

It’s fine to let different sites and implementations diverge from the standard, and not just because they already do - a site that needs tables, where the community is mostly coming from MediaWiki, will have a different call for table features and syntax than a site that, say, wants an easy way to format tables that are coming from a CSV (such as a statistics forum). Different places have different legacies, and different scopes of interest.

A Standard Markup should define the precedence and syntax rules for the subset of features that all Markdown implementations are expected to provide, that are as close to universally accepted as possible. If an implementation of a core feature defined by Standard Markdown is divergent in a Markdown renderer, it should never elicit a response beyond “oh, yeah, I’ll fix that bug”. If “Standard Markdown” dictates lots of unique and unprecedented quirks, it’s going to yield just another response of “oh yeah, that’s that site where they have all those kooky opinions, whatever,” and nothing gets fixed.

With a simple Standard Markdown base, changes to describe an implementation’s specific flavorings should be as stringently documentable as possible, with a series of “patches” that rarely, if ever, contradict something in Standard Markdown. Markdown implementations are then free to target one of these “flavors” as their own personal “standard” (or provide features of a flavor as options at render time, such as how https://github.com/chjj/marked handles auto-links and other GitHub-Flavored-Markdown features).

stuartpb · September 4, 2014, 9:00am

To clarify, at a standards level, I disagree with pretty much each change listed, except for these:

I agree with this foremost because it was mentioned as a potential future change in the original Markdown post. It really doesn’t make sense to have a list that starts with “2.” start at “1.” when rendered, and I would consider any implementation that doesn’t set the start attribute of the <ol> to be bugged. Again, though- I mostly only approve because it’s part of the original described feature set that all Markdown implementations derive from.

I’m okay with this, too, as the omission of single-quotes from reference links feels more like an oversight in the original specification than a proper behavior.

I’m okay with this (or at least some of it) for the reasons specified in the document: it’s the way most implementations (including some versions of the “original reference” implementation) work, for sensible performance reasons. The newline-reintroduces-markdown-parsing rule, while an elegant solution to a pressing issue, strikes me as going a little farther than just standardization: is any major implementation currently doing this?

Again, I’m okay with this, so long as it’s standardizing the behavior of existing implementations, or cases that widely differ across implementations. If “nearly every existing implementation flubs up” a list in the same way, it’s not the standard’s place to declare that they’re wrong.

At a flavoring level, I like almost all of these (although I disagree with making every character backslash-escaped, as it makes talking about $VARIABLES_IN_BASH_HEREDOCS, as I had to do just a few hours ago, really frustrating.) I vehemently disagree that they should be considered “standard” features.

batjko · September 4, 2014, 9:13am

@stuartpb Apart from these rules looking very reasonable, in and of themselves (I am not very versed in the many different variations currently out there), are you saying you’d prefer a formal extension system, as opposed to a centralized standard canon for all rules?

I think that might run the risk of invalidating of the effort put in to this standard. Isn’t that exactly what they’re trying to avoid?

stuartpb · September 4, 2014, 9:21am

Not as I see it. I see the advantage to this standard, like the original HTML5 standard, being a thorough, canonical, sensible description of proper Markdown behavior, that defines how implementations should handle edge cases in their standard supported subset of features.

Right now, lines like these:

<a `href=""`>
`<a href="">`
*a[b*](url)

might be rendered three different ways in three different Markdown implementations. The useful place of a standard would be to make a call, definitively, what rules should be followed to give one correct interpretation, so that any other interpretation is a bug that should be fixed to be in-line with the standard, and have compatibility with other Markdown implementations.

stuartpb · September 4, 2014, 9:26am

Semi-formal, yes (I’ll make a post about this in a bit).

Basically, I think implementations should write their own standards, written in the form of amendments to the core standard, that document their divergences with as much rigor and clarity as the main standard. This way, other implementations can copy the divergences of others (chjj/marked having an option to support all the features of GitHub Flavored Markdown, as an example), and even file bugs against the original implementation, pointing out where it diverges from the personal standard it describes.

batjko · September 4, 2014, 9:35am

I think that would certainly work for the short-term transition period.

If this were maintained in the long-run (and let’s face it, nobody can keep anyone from writing their own deviating extensions if someone feels the need for one), then it would be preferable to have the main standard, the core MarkDown so-to-speak, already cover as many use cases as possible, in order to minimize any such need for extensions and additions, and to at least discourage actual breaks of the core standard.

dpawson · September 4, 2014, 2:57pm

Line breaks.

Example 427
foo
baz

Which is exactly the opposite of ‘continuation lines’ found in many markup systems? -1

jgm · September 4, 2014, 5:05pm

+++ stuartpb [Sep 04 14 09:10 ]:

although I disagree with making every character backslash-escaped, as it makes talking about $VARIABLES_IN_BASH_HEREDOCS, as I had to do just a few hours ago, really frustrating.) I vehemently disagree that they should be considered “standard” features.

You should use code backticks. (After all, bash also has special
characters that DO have special meanings in Markdown, so if you don’t
use code backticks to quote your bash examples, it’s going to be very
fragile.)

ariabuckles · September 4, 2014, 7:59pm

First off, thank you to everyone involved for working on this. It looks like an incredible amount of work and is certainly a very valuable thing to be working on. I’m glad we have such knowledgeable people leading this effort.

I agree with this though. With all the changes here, Standard Markdown is more of a dialect than a standard. If I want to know “what core subset of syntax should I absolutely support to call a formatting language markdown-compatable”, Standard Markdown doesn’t answer that well (if the goal is that everyone should adopt everything in standard markdown–there are a huge number of small changes that make that seem unlikely).

For a language that says “one of our major goals is to make Markdown easier to parse, and to eliminate the many old inconsistencies and ambiguities that made writing a Markdown parser so difficult”, and as someone currently writing a Gruber-Markdown compatable parser, Standard Markdown looks very hard to parse (and given my goals writing my own parser, implementing all or even most of Standard Markdown is not reasonable, but I wish it was small enough that it was).

If the goal of standard markdown is to have a dialect that is wonderful to write in that handles edge cases well, I think the implementation succeeds. It looks wonderful to author content in! It handles edge cases much better than the parser I was previously using or the parser I am currently writing. But if the goal is to standardize markdown as a language that is easy to write parsers for, I think the spec has diverged significantly from this goal.

kagan · September 4, 2014, 10:39pm

I have seen many concerns regarding backwards compatibility. Some people strongly oppose any change due to that. They are afraid of breaking things. This is of course understandable.

However, it is obvious to me that “improvement” requires change by definition. No change means no improvement. Smaller changes can be achieved over time without upsetting people too much. Let me list a few examples:

The Internet Explorer (IE) used to have a market share of 90% some years ago. It seemed, Microsoft had won the “browser wars”. Today we have a completely different reality. There is a new HTML Standard. Most browsers sticked close to the standard and enjoy today the greater popularity. In the long run, the better standard has made it to the top.
These days it is normal to buy a radio, which receives only FM. In the old days we had also AM radio. The AM still has the advantage, that it has a much longer range. But the higher sound quality of FM pushed the good old AM into a corner.
Apple comes up with breaking changes much more often than its competitors. Some people including myself get annoyed by it. However, the success of this strategy is easily visible and measurable.

Having said that, I would like to congratulate this initiatve. There will always be different opinions about the details. But, it is obvious to me that this effort is not only useful, but essential.

The changes, which are listed above are IMHO really minor. We have to have the courage to introduce small changes to make markdown more intuitive and unambigious.

With all respect and appreciation to the work of the creator of markdown, he is only one person. The really great things such as XML and HTML5 are achieved by a large community. The authors of standard markdown have opened up their work to public review and that is certainly the right thing to do to achieve broad acceptance.

Thanks.

DocSalvager · September 4, 2014, 11:58pm

spitballing …

Why just one version?

I agree with @stuartpb that there should be a “standard markdown” to “specify the smallest agreed-upon subset of Markdown features.” Every language needs a baseline core of functionality.

That core should also include a good mechanism for extensibility.

The Standard Markdown spec provides at least one method for doing this in the form of allowing embedded HTML. It’s not pretty and it’s not truly “extending” the language (adding syntactic functionality) but it does allow for tables and other things not yet part of the core language. It’s like how C supports embedded assembly language.

I think we also need a cleaner mechanism than that as well.

What about designing with the expectation that websites will be running multiple versions of markdown interpreters simultaneously?

There would be a common core and then dialects with additional functionalities. I think some sites are doing something like this already.

The effort here could be to provide ways to record user/developer preferences through automatic and manual voting mechanisms. Essentially… extend the StackExchange and Discourse voting paradigms to be part of the collaborative software development lifecycle process itself.

stuartpb · September 6, 2014, 12:13am

That’s what I’m aiming to do with http://flavoredmarkdown.com (minus the “recording user/developer preferences through automatic and manual voting mechanisms” - the popularity of implementations in the real world is a better gauge).

codinghorror · January 6, 2015, 10:06pm

I moved 5 posts to a new topic: Get rid of the backslash to indicate explicit linebreak?

pmorch · June 24, 2015, 11:49am

That link (with 139 clicks) is a 404… I’m assuming it died during the project rename. Is there a new link?

jgm · June 24, 2015, 11:27pm

See here.

xenoterracide · November 19, 2020, 5:03pm

Why am I resurrecting this? to be honest I feel like a new topic wouldn’t reach the right audience, and doesn’t really address what this says in the most searchable place.

I’m going to be honest, I feel like CommonMark is failing me simply because it doesn’t do more to extend, or fix original Markdown. I in general don’t want to change the original markdown (other than making html optional for security reasons). I believe that CommonMark should be working to standardize common extensions, such as syntax highlighting in fenced code, and tables are the 2 biggest I’ve seen.

My reason for this is that I have to learn these extensions, or miss out on them in every single implementation of Markdown. There also appears to be no standard syntax for enabling proprietary, non standard extensions, so if they do become standard in the future, backwards compatible parsing may not be possible. Honestly probably the easiest way to do this would be some sort of inline marker so the parser knows to change its mode. Maybe this:

^^^github:syntax-higlighting
```java
class Foo {}
```
^^^

as annoying as it would be to add that to your spec, it would mean backwards compatibility as the spec moved forward. You could also parse it to remove it later if it was easy. Obviously this in and of itself is a backwards incompatible change, although you could describe not supporting this makes the whole document a proprietary implementation, and would be the same as defining around the whole document. Alternatively you could define the opposite and ask people to migrate their databases. Similar in a way to how <html is just html5.

^^^github
^^^

heck it even appears this quote I’m about to respond to is a failure of CommonMark, needing to switch to BBCode to actually quote someone with a link to them and their name in it, and CommonMark describes no such method for including BBCode, imho making it a proprietary extension as I would expect a valid document to render this as just text.

HTML5, this is the 5th version of HTML, meaning stuff was added to it over time.

also sadly I’ve found several partial markdown implementations, this means that implementing CommonMark is not a priority. Even if you did implement CommonMark I’m not certain that implementations would include the common extensions, so you might end up using a non compliant library, or your own incomplete implementation that does. There’s also no encouragement to use CommonMark. None of the implementations I’ve looked at reference CommonMark in anyway or point to its documentation, so a lot of developers are probably not even aware this spec exists.

The Quick Reference is incomplete, I just spend 30 minutes figuring out that I needed to use ~~~ to escape markdown itself in a code block. It’s not mentioned in the reference.

end rant.

codinghorror · November 20, 2020, 12:14am

It takes the time it takes. If you’re not willing to wait another decade, you may be disappointed

Plenty of tangible progress being made; look around you!

That being said I completely agree that the next (and thus first) extension we need to standardize is tables.

(The code language highlighting is pretty much a de-facto standard as I see it, since GitHub has such a profound influence and that’s how they do it. Note that in Discourse we let the plugin infer the code type automatically, so you don’t usually need to specify the language anyhow, that’s a bit of a local optimization.)