Issues we MUST resolve before 1.0 release [6 remaining]

jgm · July 26, 2015, 12:26am

Here are some issues in the spec that must be resolved before a 1.0 release.

To keep us focused, the optional should resolve issues are in this topic and successfully resolved issues are in this topic.

To keep things organized, please comment in the linked discussions or issues, not here. If there’s no linked discussion, start a new one on this forum. I will edit this as things are resolved or new issues come up.

Code blocks and spans

Backtick fences and inline code collision

There’s an ambiguity between backtick fenced blocks and backtick inline code that needs to be resolved explicitly.

Currently this doesn’t get parsed as a code span:

``` zounds ##
`##`` kebobble `` ```

Discussion here and an issue here and related issue here.

Links

Links within links

Currently we disallow them and favor the innermost link.

Should we allow links within links?

Discussion in this issue.

Quotes in titles

Michel Fortin notes, regarding nested quotes in titles:

There sure is room for more consistency with various quote styles and disallowing non-sensial combinations of " and ). But take note:

stmd is the only implementation not supporting unescaped quotes. http://johnmacfarlane.net/babelmark2/?normalize=1&text=Foo+[bar](%2Furl%2F+"Title+with+"quotes"+inside").

neither Markdown.pl, PHP Markdown, nor many other parsers let you escape a double quote (or a single quote), so the obvious solution is unfortunately non-portable and you’ll have to recommend using ". http://johnmacfarlane.net/babelmark2/?normalize=1&text=\“quotes\”

I replied:

It seems to me that the backslash-escapes should work in these contexts. There’s no clear reason why they should be disallowed; they are clearly useful; and the syntax description never says that they don’t work in these contexts. Allowing them to work removes 50% of the motivation for allowing nested quotes. Allowing you to use other quote types for titles across the board (’ or ()) removes another 25%. Or so I reasoned, anyway. There is a backwards-compatibility concern, which is serious, though I’ll bet the cases affected are very rare.

See Delimiters in link and image title attributes · Issue #308 · commonmark/commonmark-spec · GitHub

Lists

Odd reference link/list case

- [foo]: bar
baz

produces a list with one item, baz. Is this really right? (This seems like an implementation issue rather than a spec problem.)
(This issue still exists as of April 2019; an issue should be created in commonmark/commonmark, as I think this is an implementation bug, and the case should be added as an example to the spec.)

Reconsider allowing lists to break paragraphs

There’s a big discussion here:

The resolution we achieved still feels like a hack, and not really principled. Is there a better way?

Other

Tab issues

Handling of tabs needs to be further specified in the spec. See

The change in tab handling (preserving tabs but trying to treat them as if they were converted to spaces at a tab stop of 4 when processing block structure) has some residual issues, and we need to go through the whole spec again with this in mind. We also need lots more test cases involving tabs.

See this issue for some good cases.

E.g. what does the spec say to do for

1.<TAB><TAB>hi

This is a bit unclear. Presumably we should have a code block, but what should be in it?

We should add test cases to the spec with tabs after list markers, block quote markers, and elsewhere.

codinghorror · January 3, 2016, 10:52am

Here’s my vote on the twelve (now eleven!) MUST items, cc: @jgm

Preservation of spaces in backtick code

`are trailing spaces trimmed from here-->   `

We should preserve spaces in code spans, unless it’s super hard. There’s a near 50/50 split in babelmark so either approach is really OK.

Backtick fences and inline code collision

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

Multi-word restriction of some kind, or leading space restriction (if possible). Removing code fences is absolutely not an option. Do note that in GitHub code fences the official documentation shows no space between the backticks and the language.

Links within links

[outer [inner](http://inner)](http://outer)

Whatever the simplest way is to disallow this, I think we should do it. As already mentioned by @balpha it feels pathological to me, I don’t see a good use for it.

Inconsistent handling of spaces in links

<http://example.com/hey nice link>

There’s a very well understood way to encode spaces into links, our old pal %20 and spaces in links are some bad mojo anyway that we should not be encouraging. We should not allow spaces in links.

Quotes in titles

Foo [bar](/url/ "Title with "quotes" inside")

Replacing the " with " seems crazily HTML-specific, which long term is not the goal of CommonMark, so I support your solution of allowing the escape \" to work, instead.

URL normalization

Already decided, so great!

Should space be required after # in ATX headers?

#Heading1

I am 100% certain the answer is “yes” here. Way too much damage to average user input if we allow this unnecessary flexibility, largely due to the rise of the #hashtag in popular media. Tighten it down, we have ample proof there’s a problem, and the tension from the fix is minor.

Setext header and list precedence issue

1. Juli
------

- Event 1
- Event 2

I think this should be escaped like any other list special case, or the user can use the ## header form.

Allow setext headers to interrupt paragraphs for consistency

Paragraph
Header
=====
Paragraph

Babelmark says this is quite divergent, and I think we should continue to be strict here and NOT allow sextext headers to interrupt paragraphs, as that reads quite poorly to me in plaintext – what kind of heading has no whitespace?

I don’t find multiline headers particularly compelling.

Odd reference link/list case

- [foo]: bar
baz

Babelmark says a list with one item is what this should produce. Given that there’s virtually no divergence here, do we care? Are people really running into this? Is it a useful set of input?

Remove two-blanks rule

I’ve come to think that the “two blank lines breaks out of lists” rule is more trouble than it’s worth…

… I think the current spec could be clarified if rule 1 for list items explicitly said “a sequence of lines not containing two consecutive blank lines that are not in a fenced code block.”

I completely trust your instincts on this one. Everyone in the topic says your suggestions are reasonable, so go with whatever you believe is best here.

Handling of tabs needs to be further specified

>[TAB][TAB]x

Seems to me a tab is not quite a space, so the “is this a space after a blockquote” rule doesn’t apply.

I think the babelmark results show a rough consensus that this should be a code block with 3 spaces.

Honestly I think as long as a code block is rendered, which is definitely the consensus here, no one will be too bothered by some extra spaces.

Entity handling

Revise what the spec requires a propos entities:

See Spec Issues: (Character) Entity References - #12 by tin-pot
Entities · Issue #442 · commonmark/commonmark-spec · GitHub

Crissov · January 3, 2016, 12:30pm

Only replying here to the MUST issues marked by @codinghorror as “no topic”.

Quotes in titles

Foo [bar](/url/ "Title with "quotes" inside")
Foo [bar](/url/ "Title with \"quotes\" inside")

The latter should absolutely work, although it currently doesn’t in many implementations.

Since most implementations (incl. Markdown.pl) seem to handle the former as a naive author would expect (i.e. automatically escape the inner quotes), it would be fine if the spec could define that behavior, too.

Allow setext headers to interrupt paragraphs for consistency

Paragraph
Heading
----
Paragraph

We should not be looking at top-level = setext headings, but second-level - ones, because they’re ambiguous with “thematic breaks”. There are four general ways to parse this:

Paragraph, heading, paragraph
Heading (with 2 lines), paragraph
Paragraph (with 2 lines), separator, paragraph
Paragraph (with 4 lines)

The second interpretation is rare (only Parsedown in Babelmark). It assumes lazy wrapping of headings. That can be expensive, because every soft-wrapped paragraph could turn into a heading this way and the parsers wouldn’t know until it had consumed the last line with only equal signs or dashes in it. All other lazy wrapping is decided by the marker of the first line, so it’s reasonable to avoid this behavior (like the CM spec does), although some authors may expect otherwise.

The third option, which the reference implementation currently adopts (as do few others), makes no sense in my opinion. Setext headings should always take precedence over mere horizontal lines! Since they’re restricted to a single line of text for reasons explained above, that suggests option 1.

The last option has some merit over the first, because – like the third – it allows consistent treatment if put inside a list, for example, where headings would not be expected to occur at all.

* Paragraph
  Heading
  ----
  Paragraph

To fulfill both requirements, i.e. heading wins over separator and no different treatment inside a list item, there’s actually no choice but to treat it as a single paragraph (like only Pandoc, Kramdown and Minima do), unless we want (setext) headings inside lists (which Cebe, Maruku and Discount support). Most implementations actually support ATX headers and unambiguous setext headings inside lists, though. Since the reference implementation is among them, option 2 makes more sense again.

Odd reference link/list case

- [foo]: bar
baz

It seems to be a single-item tight list with content “baz”.

jgm · January 5, 2016, 10:21pm

With the algorithm currently used by cmark and commonmark.js, there is no efficiency cost to allowing multiline setext headers. We store successive lines in paragraphs, and when we hit one that looks like a setext header line, we just convert the paragraph into a setext header. No backtracking is needed. Indeed, allowing multiline setext headers would eliminate the need for a check we currently do (to see if only one line of text has been parsed).

I’m leaning towards thinking multiline headers are the best interpretation here. People who hard-wrap their text to a fixed column width may occasionally need these (although stylistically it’s generally best to avoid overly long headers). Currently there’s no way to do multiline headers in CommonMark (or Markdown generally), so the change would increase expressive power.

jgm · January 8, 2016, 6:43am

I’ve gone with allowing multiline setext headings. Spec changes have been pushed, please have a look at

github.com/commonmark/commonmark-spec

Modified setext heading spec to allow multiline headings.

committed 06:29AM - 08 Jan 16 UTC

jgm

+105 -24

Text like Foo bar --- baz is now interpreted as heading + para…graph, rather than paragraph + thematic break + paragraph. Existing implementations diverge quite a bit on this case, with several interpretations: 1. paragraph, heading, paragraph 2. paragraph, break, paragraph 3. paragraph containing literal `---` 4. heading, paragraph Interpretation 4 seems most natural, and it opens up an expressive possibility otherwise closed off -- multiline headings. Authors who want interpretation 2 can use a form that can't be interpreted as a setext heading line, e.g. Foo bar * * * baz or insert blank space around the thematic break. Authors who want interpretation 3 can use backslash escapes. Authors who want interpretation 1 can put a blank line after the first paragraph.

and
https://github.com/jgm/CommonMark/commit/1357f2859ecb128636ea7a764b70407dca4e4015

The code changes in cmark and commonmark.js amounted to deleting one line, and had no effect on benchmarks.

xoofx · March 26, 2016, 12:45am

I’m wondering how can we help to go to 1.0 release? What would help you @jgm concretely?

jgm · April 25, 2016, 5:08pm

I have added some items to the list, I’m afraid.

Crissov · December 18, 2017, 12:32am

What and where is the current list?

Regarding the odd list case with the link reference definition, the source is probably at Example 0.28:

Link reference definitions can occur inside block containers, like lists and block quotations.

Example 175 shows how a paragraph can start directly after a line containing a reference link definition.

A sub-question is whether this should be an empty list item:

- 
foo

There is no example or prose explaining exactly that.

I have found myself wishing such embedded link reference definitions would work but would still be shown in the output, especially in lists where they otherwise always lead to empty list items because nothing can precede or follow them therein.

- [foo]: bar
baz
.
<ul>
<li><a href="bar">[foo]: bar</a></li>
</ul>
<p>baz</p>

vas · February 6, 2018, 6:18pm

It’s been 2½ years since this list was started, at least 3½ years since this forum was opened, over 5 years since Jeff’s The Future of Markdown.

It’s now been a year since GitHub adopted CommonMark.

Releasing 1.0 will further the goals and adoption of CommonMark. It will also allow the development of CommonMark v1.1 or v2 to move forward, which will further the goals and adoption even more.

So @jgm @codinghorror, what needs to happen to resolve this list? What can the community do?

Perhaps add a V1 milestone to the issues database, add the issues corresponding to the above to the milestone, and start taking pull requests? BTW, maybe announce that this forum is for design/new feature discussion, but bugs in the spec should get moved into GitHub, with a message on such threads here with a link to the GitHub issue. It will make this all easier to manage.

chrisalley · February 7, 2018, 6:31am

I think part of the reason for the delay in releasing a “1.0” version of the core spec is that it would be very hard to patch without breaking backward compatibility, and a 1.0 version isn’t strictly needed for the spec to be useful in production. I don’t disagree with your points though @vas.

That said, I see no reason why work couldn’t commence on extension specs before the core spec reaches 1.0 (by other members of the community if @jgm is busy). Formalising some of the extensions from GitHub Flavored Markdown - tables, task list items, strikethrough, autolinks, and disallowed raw HTML - here on commonmark.org would be good place to start. Alternatively, if CommonMark extensions as part of the CommonMark project are no longer on the cards, that’s something the community would benefit from being made aware of so that extension specs could be developed independently.

vas · February 7, 2018, 4:54pm

If what you say is true, it has been true for 2½ years, and implies such wide adoption that breaking backward compatibility keeps the spec stuck this way for 2½ years, which in turn means we have a de facto 1.0 release. If so, we should make it official and move on to v1.1 where we’d have the freedom to fix things with minor backward incompatibility.

If what you say is not true, we can address this list for v1.0 and do it soon.

A standard that can’t take a stand is just a recommendation.

chrisalley · February 8, 2018, 8:43am

Some of the issues listed here aren’t so minor. For the issues with linked forum topics, these are known issues that are still under discussion; the implementation hasn’t yet been decided. If you want to see a 1.0 release sooner, contributing to those individual discussions would help move the spec towards 1.0 as they have been explicitly stated as 1.0 blockers (independent of time).

o314 · March 7, 2018, 5:22am

Seniority certainly has to be taken into account at some point.
But let’s see. Github does not it any more. it uses commonmark.

How to count now ? it remains 13,537,268 - 1 users of kramdom ? Overstated?
May be should we also precise github use a github flavored commonmarkdown spec instead of a github flavored markdown (gfm).
Is it that an unconditional endorsement for commonmark diffusion around a well approved status quo ?

As others, i appreciate and thanks the efforts of the community around this project.
But IMHO, it seems there is some bikeshedding around important issues like:

css class
metadata
extensibility
ast

Issues way more important than to handle uncountable variants of line break.

As any project, open source should know how to stay on track and on time…

vas · March 7, 2018, 9:49am

I agree, and that’s essentially what I’ve been saying in a different, but less presumptive, way.

But saying things like “It seems that commonmark is slowly dying” is neither accurate nor helpful. It’s no more accurate than saying “It seems that Gruber’s Markdown is slowly dying.” We all know the opposite is true. The problem isn’t death/lack of adoption, it’s fragmentation.

Neither are your statistics helpful, because you know that saying about lies and statistics. Such comments aren’t going to spur things to move. It’s not how leaders talk, nor is it how you get leaders to listen.

How about you start a new forum topic making your above bikeshedding case, with a solid line of reasoning, sans hyperbole?

geraldb · March 9, 2018, 2:17pm

GitHub Pages used to use kramdown, but now it follows CommonMark.

FYI: GitHub Pages still uses kramdown. GitHub uses CommonMark (with GitHub Flavored Markdown extensions) for READMEs and markdown rendered on GitHub itself (but not for GitHub Pages). Sorry if this sounds confusing.

notriddle · March 9, 2018, 2:42pm

You’re right. I was misremembering the time they switched to kramdown.

vas · March 9, 2018, 6:53pm

From @notriddle’s link:

GitHub-flavored Markdown is supported by kramdown by default, so you can use Markdown with GitHub Pages the same way you use Markdown on GitHub.

In other words you can use CommonMark with GitHub Pages the same way you use it on GitHub.

And yes, even kramdown supports CommonMark. Does that mean we get to add kramdown’s numbers to CommonMark’s?

That’s a rhetorical question, so please don’t answer. This debate is starting to get silly, and sounding like national politics. Let’s get back to progress with CommonMark, whether that’s nailing 1.0 first or declaring it done, we need to move on to 1.1 asap. We need to move to head off even more fragmentation.

Crissov · March 9, 2018, 7:25pm

JFTR, Kramdown’s GFM mode for Jekyll (which is what GitHub Pages is build upon) does not conform to Commonmark and probably never will. If I’m not confusing things again, GitHub uses commonmarker for .md file previews but cmark-gfm for READMEs.