Info strings for suffixed headings

Crissov · August 26, 2018, 11:53am

https://github.com/commonmark/CommonMark/issues/500

Status quo (0.28)

An ATX heading consists of a string of characters, parsed as inline content, between an opening sequence of 1–6 unescaped # characters and an optional closing sequence of any number of unescaped # characters.

[Emphasis mine.]

Valid heading suffixes from spec examples

## foo ##
  ###   bar    ###

# foo ##################################
##### foo ##

### foo ###

Invalid heading suffixes

### foo ### b

# foo#

### foo \###
## foo #\##
# foo \#

Proposal

I wish we can somewhat relax the rule for the closing sequence of # characters. I want to enable something like an info string for future extensions: If the number of unescaped # characters in the opening sequence is matched exactly by the number of unescaped # characters in a possible closing sequence with a space on both sides, the characters after this closing sequence form an info string and are not used verbatim for output.

Only example 44 would be affected by this change:

### foo ### b
.
<h3>foo</h3>

instead of

### foo ### b
.
<h3>foo ### b</h3>

Also:

### foo ### ###
### foo ### ##
### foo ## ###
### foo ## ### ##
.
<h3>foo</h3>
<h3>foo</h3>
<h3>foo ##</h3>
<h3>foo ##</h3>

And:

# foo `#`
# foo ` # `
.
<h3>foo <code>#</code></h3>
<h3>foo `</h3>

… where I’m not completely sure about the last one.

Common use cases

# foo # #bar .baz "quuz"
# foo # {id=bar class=baz title=quuz}
.
<h1 id="bar" class="baz" title="quuz">foo</h1>
<h1 id="bar" class="baz" title="quuz">foo</h1>

The string either inside quotation marks "" or parentheses () would be used in the HTML title attribute, but also in a table of contents, i.e. LaTeX (book) equivalence would be like this:

# foo # "bar"
# foo # (bar)
.
\chapter[bar]{foo}
\chapter[bar]{foo}

There could also be an extension that made headings not have an auto-generated number or not being included in the ToC, or both, e.g.:

# foo # -
.
\chapter*{foo}

Iʼm not proposing the #id .class "title" @attribute key=value syntax (nor any other) for the info string, just that there be an optional info string in ATX headings.

Impact

I do not have a corpus available to test this proposal with, but I expect there to be very little existing content that would be affected by the change, because the hash or number sign # is very unlikely to occur within a heading (or prose in general) with spaces on both sides of it. A string of # characters is even less likely. Authors who are talking of the character itself will very likely mark it up with directly adjacent backticks or quotation marks.

Consequences

People would probably also like to have info strings for setext headings if they were available for ATX headings. Their = and - underlines do not allow any contents in the same line either. A possible, but hackish, solution would be to only consider trailing characters for an info string if the underline length matches the characters in the (last line of the) heading.

I am not proposing this behavior right now.

foo
=== #bar .baz "quuz"
foo
--- #bar .baz "quuz"
.
<h1 id="bar" class="baz" title="quuz">foo</h1>
<h2 id="bar" class="baz" title="quuz">foo</h2>

foo
==  ==  #bar .baz "quuz"
foo
- #bar .baz "quuz"
.
<p>foo ==  ==  #bar .baz "quuz" foo</p>
<ul><li>#bar .baz "quuz"</li></ul>

More ideas like this are discussed in Info strings elsewhere.

Crissov · August 28, 2018, 12:42pm

This is a good proposal, let’s do this.
That’s useful, but let’s solve the issue differently.
This would break too much.

0 voters

It would be nice to explain dissent in a comment below.

digitalmoksha · September 3, 2018, 1:35am

I guess I’m unclear what benefit this has over a more generic attribute extension that can be used in many places, such as that used by Pandoc or kramdown, like adding {#identifier .class .class key=value key=value}.

If the number of unescaped # characters in the opening sequence is matched exactly by the number of unescaped # characters in a possible closing sequence with a space on both sides, the characters after this closing sequence form an info string and are not used verbatim for output.

I feel like this style is going to get complicated when trying to extend this functionality to other elements.

Intuitively, I feel like having a consistent attribute syntax, as talked about here would be a better overall solution.

In fact, I wish we would just either choose the Pandoc or kramdown style, tweak if necessary, and get it done It’s really something that’s needed…

Crissov · September 3, 2018, 6:08am

One benefit is that it makes existing attribute extensions partially compatible if written in a specific way: consider curly braces optional wrappers around indo strings, but required in places where info strings are otherwise impossible or for line breaks inside.

## Heading ## {info string within optional wrappers} 

![text](target "title" {info string within optional wrappers}) or 
![text](target {"title" info string within optional wrappers})

[label] 

  label: <target> {info string within optional wrappers} 

``` {info string within optional wrappers} 
```

digitalmoksha · September 3, 2018, 11:50am

It seems like allowing for optional wrappers adds unnecessary complexity. Since the syntax for the wrappers needs to be defined anyway, why not just always require them.

It feels like we’re trying to implement a solution (with all it’s inherent edge cases) to solve a problem we should be addressing headon: a valid attribute syntax, that most people agree is needed. If we could solve that, it’s edge cases, with a strict and simple syntax - then we could look at relaxing the need for wrappers in certain cases.

One benefit is that it makes existing attribute extensions partially compatible if written in a specific way

You’re right, but I think that might start leading to more fragmentation. I don’t think we should allow for multiple attribute extension syntaxes - there should be one well defined one, with some well defined attributes (such as width) but with the flexibility for any key/value pairs.

It seems like your info string is basically the same as the attribute string, we’re just wrangling over how to implement it

Crissov · September 3, 2018, 8:11pm

The point is that there will be no generic curly attribute syntax in vanilla 1.0, because it would be too much a deviation from Markdown. It may always stay an optional extension. Additional opportunities for the established concept of info strings, however, could be worth the necessary minor syntax tweaks. Also keep in mind that source readability is a major design feature of MD/CM and restricting places for info strings can actually guarantee that better than a very generic syntax for auxiliary data.

jgm · September 4, 2018, 4:51pm

I agree that in the long run a consistent attribute
syntax is the way to go. The “info string” idea for
code blocks was a kind of compromise, which left room
for attributes without specifying any particular
syntax.