Issues we SHOULD resolve before 1.0 release

This is a companion topic for the CommonMark 1.0 release.

These are issues that SHOULD be resolved before the 1.0 release, if possible, but are not required to release CommonMark 1.0.

Turn empty link definitions into anchors? (SHOULD)

Example:

[link text]:

The link text cannot contain links, though it may
contain images.  Etc.

... later ...

See the discussion of [link text], above.

Issue and discussion.

Email addresses regex (SHOULD)

There are some tests that our current parsers fail.

For email addresses we used the “non-normative regex” from the HTML5 spec, which seemed a nonarbitrary and practical thing to use:

http://jgm.github.io/stmd/spec.html\#email-autolink

It seems not to allow the international example or the crazier ones (with strange symbols and quotes). Probably this should be fixed in our spec.

See what PHP markdown extra uses.

Embedded audio and video (SHOULD)

Should we change the spec so that instead of “images” it talks of media more generally, and allow the ![image](url) syntax to be used for audio and video? (Renderers could render appropriately for the media.)

Discussion here.

Unicode bullets for bullet lists (SHOULD)

See pull request adding unicode bullet characters to +, -, and * for unordered lists.

Discussion here.

List item spec (SHOULD)

 > Blockquote
> continued here.


1.  > Blockquote
   > continued here.

is parsed in a surprising way (violating the Principle of Uniformity). Discussion here.

As noted, this is a special case of a more general issue (arising for blocks that can start after optional indentation):

  oo
--

-   oo
  --

  # hi

ok

-   # hi

  ok

The current list item spec is designed to allow line-by-line parsing (i.e. we don’t require parsing a block, stripping prefixes, then parsing the result again). But I think this should still be possible, albeit more complex, if we change the spec to allow cases like this.

Spec examples in XML format? (SHOULD)

Should the spec be changed to use custom XML instead of HTML for the examples? This would avoid some issues about HTML normalization, and make it clearer that the spec is about parsing, not HTML rendering (the details of which can be implementation-specific). It would require that the test runner be able to parse HTML into our custom XML, so comparisons can be made – we don’t want to require implementers to produce an XML renderer.

Issue here

Discussion here

Should there be additional information in the AST? (SHOULD)

Currently we lose information about:

  • whether a link was an autolink
  • whether it was a reference link, what kind, and what label
  • what the bullet character was for a list item
  • what character was used for emphasis/strong emphasis
  • whether the 1. or 1) style of ordered list was used.
  • whether a code block was backtick or indented

Should some or all of these things be part of the AST, or is this too “concrete”? (This would mainly affect conversion back to CommonMark, though other renderers could decide to be sensitive to these things.)

Lexical ambiguity with processing instructions (SHOULD)

Discussion here