Issues we MUST resolve before 1.0 release [8 remaining]

release-1.0

#1

Here are some issues in the spec that must be resolved before a 1.0 release.

To keep us focused, the optional should resolve issues are in this topic and successfully resolved issues are in this topic.

To keep things organized, please comment in the linked discussions or issues, not here. If there’s no linked discussion, start a new one on this forum. I will edit this as things are resolved or new issues come up.

Code blocks and spans

Preservation of spaces in backtick code

Currently interior spaces and newlines are collapsed into single spaces, just as they would be by a browser. But surely there are reasons why someone might want multiple spaces in a code span. So they should not be collapsed. The treatment of newlines is a bit more complex; maybe they should be turned into spaces? But what if they are already adjacent to spaces?

See Matthijs_Kooijm’s comment.

Leading and trailing space is also stripped. We certainly need to strip one leading and trailing space, but we could limit it to just one to allow inline code with leading or trailing spaces.

Discussion here.

Backtick fences and inline code collision

There’s an ambiguity between backtick fenced blocks and backtick inline code that needs to be resolved explicitly.

Currently this doesn’t get parsed as a code span:

``` zounds ##
`##`` kebobble `` ```

Discussion here and an issue here and related issue here.

Links

Links within links

Currently we disallow them and favor the innermost link.

Should we allow links within links?

Discussion in this issue.

Quotes in titles

Michel Fortin notes, regarding nested quotes in titles:

There sure is room for more consistency with various quote styles and disallowing non-sensial combinations of " and ). But take note:

  1. stmd is the only implementation not supporting unescaped quotes. http://johnmacfarlane.net/babelmark2/?normalize=1&text=Foo+[bar](%2Furl%2F+"Title+with+"quotes"+inside").

  2. neither Markdown.pl, PHP Markdown, nor many other parsers let you escape a double quote (or a single quote), so the obvious solution is unfortunately non-portable and you’ll have to recommend using ". http://johnmacfarlane.net/babelmark2/?normalize=1&text=\“quotes\”

I replied:

It seems to me that the backslash-escapes should work in these contexts. There’s no clear reason why they should be disallowed; they are clearly useful; and the syntax description never says that they don’t work in these contexts. Allowing them to work removes 50% of the motivation for allowing nested quotes. Allowing you to use other quote types for titles across the board (’ or ()) removes another 25%. Or so I reasoned, anyway. There is a backwards-compatibility concern, which is serious, though I’ll bet the cases affected are very rare.

See https://github.com/jgm/CommonMark/issues/308

Headers

### Link reference definition followed by setext with blank line

See https://github.com/jgm/CommonMark/issues/395
Note: This seems to be an implementation issue, not a spec issue; see the linked issue.

Lists

Odd reference link/list case

- [foo]: bar
baz

produces a list with one item, baz. Is this really right? (This seems like an implementation issue rather than a spec problem.)

Reconsider allowing lists to break paragraphs

There’s a big discussion here:

The resolution we achieved still feels like a hack, and not really principled. Is there a better way?

Other

Tab issues

Handling of tabs needs to be further specified in the spec. See
http://talk.commonmark.org/t/tab-related-issues/1831

The change in tab handling (preserving tabs but trying to treat them as if they were converted to spaces at a tab stop of 4 when processing block structure) has some residual issues, and we need to go through the whole spec again with this in mind. We also need lots more test cases involving tabs.

See this issue for some good cases.

E.g. what does the spec say to do for

1.<TAB><TAB>hi

This is a bit unclear. Presumably we should have a code block, but what should be in it?

We should add test cases to the spec with tabs after list markers, block quote markers, and elsewhere.


Issues RESOLVED for 1.0 release
Roadmap for CommonMark
Letter-ordered lists
Issues we SHOULD resolve before 1.0 release
Roadmap for CommonMark
Consistent attribute syntax
Front matter best practice?
Should there be additional information in the AST?
Should there be additional information in the AST?
#2

#13

Here’s my vote on the twelve (now eleven!) MUST items, cc: @jgm

Preservation of spaces in backtick code

`are trailing spaces trimmed from here-->   `

We should preserve spaces in code spans, unless it’s super hard. There’s a near 50/50 split in babelmark so either approach is really OK.

Backtick fences and inline code collision

``` hello
this is NOT inline code with one backtick ` and two backticks ``;
it is a code block!
```

Multi-word restriction of some kind, or leading space restriction (if possible). Removing code fences is absolutely not an option. Do note that in GitHub code fences the official documentation shows no space between the backticks and the language.

Links within links

[outer [inner](http://inner)](http://outer)

Whatever the simplest way is to disallow this, I think we should do it. As already mentioned by @balpha it feels pathological to me, I don’t see a good use for it.

Inconsistent handling of spaces in links

<http://example.com/hey nice link>

There’s a very well understood way to encode spaces into links, our old pal %20 and spaces in links are some bad mojo anyway that we should not be encouraging. We should not allow spaces in links.

Quotes in titles

Foo [bar](/url/ "Title with "quotes" inside")

Replacing the " with &quot; seems crazily HTML-specific, which long term is not the goal of CommonMark, so I support your solution of allowing the escape \" to work, instead.

URL normalization

Already decided, so great!

Should space be required after # in ATX headers?

#Heading1

I am 100% certain the answer is “yes” here. Way too much damage to average user input if we allow this unnecessary flexibility, largely due to the rise of the #hashtag in popular media. Tighten it down, we have ample proof there’s a problem, and the tension from the fix is minor.

Setext header and list precedence issue

1. Juli
------

- Event 1
- Event 2

I think this should be escaped like any other list special case, or the user can use the ## header form.

Allow setext headers to interrupt paragraphs for consistency

Paragraph
Header
=====
Paragraph

Babelmark says this is quite divergent, and I think we should continue to be strict here and NOT allow sextext headers to interrupt paragraphs, as that reads quite poorly to me in plaintext – what kind of heading has no whitespace?

I don’t find multiline headers particularly compelling.

Odd reference link/list case

- [foo]: bar
baz

Babelmark says a list with one item is what this should produce. Given that there’s virtually no divergence here, do we care? Are people really running into this? Is it a useful set of input?

Remove two-blanks rule

I’ve come to think that the “two blank lines breaks out of lists” rule is more trouble than it’s worth…

… I think the current spec could be clarified if rule 1 for list items explicitly said “a sequence of lines not containing two consecutive blank lines that are not in a fenced code block.”

I completely trust your instincts on this one. Everyone in the topic says your suggestions are reasonable, so go with whatever you believe is best here.

Handling of tabs needs to be further specified

>[TAB][TAB]x

Seems to me a tab is not quite a space, so the “is this a space after a blockquote” rule doesn’t apply.

I think the babelmark results show a rough consensus that this should be a code block with 3 spaces.

Honestly I think as long as a code block is rendered, which is definitely the consensus here, no one will be too bothered by some extra spaces.

Entity handling

Revise what the spec requires a propos entities:

See Spec Issues: (Character) Entity References
https://github.com/jgm/CommonMark/issues/442


Leading and trailing white spaces in code blocks
Whitespace in image paths
#14

Only replying here to the MUST issues marked by @codinghorror as “no topic”.

Quotes in titles

Foo [bar](/url/ "Title with "quotes" inside")
Foo [bar](/url/ "Title with \"quotes\" inside")

The latter should absolutely work, although it currently doesn’t in many implementations.

Since most implementations (incl. Markdown.pl) seem to handle the former as a naive author would expect (i.e. automatically escape the inner quotes), it would be fine if the spec could define that behavior, too.

Allow setext headers to interrupt paragraphs for consistency

Paragraph
Heading
----
Paragraph

We should not be looking at top-level = setext headings, but second-level - ones, because they’re ambiguous with “thematic breaks”. There are four general ways to parse this:

  1. Paragraph, heading, paragraph
  2. Heading (with 2 lines), paragraph
  3. Paragraph (with 2 lines), separator, paragraph
  4. Paragraph (with 4 lines)

The second interpretation is rare (only Parsedown in Babelmark). It assumes lazy wrapping of headings. That can be expensive, because every soft-wrapped paragraph could turn into a heading this way and the parsers wouldn’t know until it had consumed the last line with only equal signs or dashes in it. All other lazy wrapping is decided by the marker of the first line, so it’s reasonable to avoid this behavior (like the CM spec does), although some authors may expect otherwise.

The third option, which the reference implementation currently adopts (as do few others), makes no sense in my opinion. Setext headings should always take precedence over mere horizontal lines! Since they’re restricted to a single line of text for reasons explained above, that suggests option 1.

The last option has some merit over the first, because – like the third – it allows consistent treatment if put inside a list, for example, where headings would not be expected to occur at all.

* Paragraph
  Heading
  ----
  Paragraph

To fulfill both requirements, i.e. heading wins over separator and no different treatment inside a list item, there’s actually no choice but to treat it as a single paragraph (like only Pandoc, Kramdown and Minima do), unless we want (setext) headings inside lists (which Cebe, Maruku and Discount support). Most implementations actually support ATX headers and unambiguous setext headings inside lists, though. Since the reference implementation is among them, option 2 makes more sense again.

Odd reference link/list case

- [foo]: bar
baz

It seems to be a single-item tight list with content “baz”.


#21

With the algorithm currently used by cmark and commonmark.js, there is no efficiency cost to allowing multiline setext headers. We store successive lines in paragraphs, and when we hit one that looks like a setext header line, we just convert the paragraph into a setext header. No backtracking is needed. Indeed, allowing multiline setext headers would eliminate the need for a check we currently do (to see if only one line of text has been parsed).

I’m leaning towards thinking multiline headers are the best interpretation here. People who hard-wrap their text to a fixed column width may occasionally need these (although stylistically it’s generally best to avoid overly long headers). Currently there’s no way to do multiline headers in CommonMark (or Markdown generally), so the change would increase expressive power.


#25

I’ve gone with allowing multiline setext headings. Spec changes have been pushed, please have a look at


and

The code changes in cmark and commonmark.js amounted to deleting one line, and had no effect on benchmarks.


A proposal to support the <mark> tag with Markdown
#30

I’m wondering how can we help to go to 1.0 release? What would help you @jgm concretely?


#34

I have added some items to the list, I’m afraid.


#35

#36

What and where is the current list?

Regarding the odd list case with the link reference definition, the source is probably at Example 0.28:

Link reference definitions can occur inside block containers, like lists and block quotations.

Example 175 shows how a paragraph can start directly after a line containing a reference link definition.

A sub-question is whether this should be an empty list item:

- 
foo

There is no example or prose explaining exactly that.

I have found myself wishing such embedded link reference definitions would work but would still be shown in the output, especially in lists where they otherwise always lead to empty list items because nothing can precede or follow them therein.

- [foo]: bar
baz
.
<ul>
<li><a href="bar">[foo]: bar</a></li>
</ul>
<p>baz</p>

#37

It’s been 2½ years since this list was started, at least 3½ years since this forum was opened, over 5 years since Jeff’s The Future of Markdown.

It’s now been a year since GitHub adopted CommonMark.

Releasing 1.0 will further the goals and adoption of CommonMark. It will also allow the development of CommonMark v1.1 or v2 to move forward, which will further the goals and adoption even more.

So @jgm @codinghorror, what needs to happen to resolve this list? What can the community do?

Perhaps add a V1 milestone to the issues database, add the issues corresponding to the above to the milestone, and start taking pull requests? BTW, maybe announce that this forum is for design/new feature discussion, but bugs in the spec should get moved into GitHub, with a message on such threads here with a link to the GitHub issue. It will make this all easier to manage.


#38

I think part of the reason for the delay in releasing a “1.0” version of the core spec is that it would be very hard to patch without breaking backward compatibility, and a 1.0 version isn’t strictly needed for the spec to be useful in production. I don’t disagree with your points though @vas.

That said, I see no reason why work couldn’t commence on extension specs before the core spec reaches 1.0 (by other members of the community if @jgm is busy). Formalising some of the extensions from GitHub Flavored Markdown - tables, task list items, strikethrough, autolinks, and disallowed raw HTML - here on commonmark.org would be good place to start. Alternatively, if CommonMark extensions as part of the CommonMark project are no longer on the cards, that’s something the community would benefit from being made aware of so that extension specs could be developed independently.


#39

If what you say is true, it has been true for 2½ years, and implies such wide adoption that breaking backward compatibility keeps the spec stuck this way for 2½ years, which in turn means we have a de facto 1.0 release. If so, we should make it official and move on to v1.1 where we’d have the freedom to fix things with minor backward incompatibility.

If what you say is not true, we can address this list for v1.0 and do it soon.

A standard that can’t take a stand is just a recommendation.


#40

Some of the issues listed here aren’t so minor. For the issues with linked forum topics, these are known issues that are still under discussion; the implementation hasn’t yet been decided. If you want to see a 1.0 release sooner, contributing to those individual discussions would help move the spec towards 1.0 as they have been explicitly stated as 1.0 blockers (independent of time).


#45

Seniority certainly has to be taken into account at some point.
But let’s see. Github does not it any more. it uses commonmark.

How to count now ? it remains 13,537,268 - 1 users of kramdom ? Overstated?
May be should we also precise github use a github flavored commonmarkdown spec instead of a github flavored markdown (gfm).
Is it that an unconditional endorsement for commonmark diffusion around a well approved status quo ?

As others, i appreciate and thanks the efforts of the community around this project.
But IMHO, it seems there is some bikeshedding around important issues like:

  • css class
  • metadata
  • extensibility
  • ast

Issues way more important than to handle uncountable variants of line break.

As any project, open source should know how to stay on track and on time…


#46

I agree, and that’s essentially what I’ve been saying in a different, but less presumptive, way.

But saying things like “It seems that commonmark is slowly dying” is neither accurate nor helpful. It’s no more accurate than saying “It seems that Gruber’s Markdown is slowly dying.” We all know the opposite is true. The problem isn’t death/lack of adoption, it’s fragmentation.

Neither are your statistics helpful, because you know that saying about lies and statistics. Such comments aren’t going to spur things to move. It’s not how leaders talk, nor is it how you get leaders to listen.

How about you start a new forum topic making your above bikeshedding case, with a solid line of reasoning, sans hyperbole?


#47

GitHub Pages used to use kramdown, but now it follows CommonMark.

FYI: GitHub Pages still uses kramdown. GitHub uses CommonMark (with GitHub Flavored Markdown extensions) for READMEs and markdown rendered on GitHub itself (but not for GitHub Pages). Sorry if this sounds confusing.


#48

You’re right. I was misremembering the time they switched to kramdown.


#49

From @notriddle’s link:

GitHub-flavored Markdown is supported by kramdown by default, so you can use Markdown with GitHub Pages the same way you use Markdown on GitHub.

In other words you can use CommonMark with GitHub Pages the same way you use it on GitHub.

And yes, even kramdown supports CommonMark. Does that mean we get to add kramdown’s numbers to CommonMark’s?

That’s a rhetorical question, so please don’t answer. This debate is starting to get silly, and sounding like national politics. Let’s get back to progress with CommonMark, whether that’s nailing 1.0 first or declaring it done, we need to move on to 1.1 asap. We need to move to head off even more fragmentation.


#50

JFTR, Kramdown’s GFM mode for Jekyll (which is what GitHub Pages is build upon) does not conform to Commonmark and probably never will. If I’m not confusing things again, GitHub uses commonmarker for .md file previews but cmark-gfm for READMEs.