Feature request: automatically generated ids for headers

chrisalley · January 9, 2015, 9:28am

I think this solution is adequate for shorter documents (say, a Wikipedia page) where repeated headers are unlikely. Requiring explicit header IDs would add significant overhead for the writer. For longer documents, that likelihood increases and it becomes more difficult for the writer to keep track of which links point where. It might be wise for the writer to define explicit IDs in the case of longer documents.

So, automatic header IDs, with the option of overwriting them with explicit header IDs.

Another issue with automatic IDs is that they may clash with other IDs on the page. Imagine two posts in a forum topic having the same header text. Now, suppose the first post is deleted. The order of the IDs would change and any links to the second post would break. As a solution, the parser could accept an optional namespace parameter that would be added to the start of the ID. The ID of the header “The Philosophy of CommonMark” would become #discourse-topic-115-post-40-the-philosophy-of-commonmark, for example.

an3ss · January 9, 2015, 1:47pm

I agree.

I assume you’re talking here about automatic IDs in general, not implicit header references.

Let me insist, just in case: In order to use IHRs, an author does NOT need to know anything about automatic or explicit header IDs. As mentioned before, IHRs are already implemented in Pandoc. They are documented here:
http://johnmacfarlane.net/pandoc/README.html#extension-implicit_header_references

That said, I feel we shouldn’t discuss IHRs under this topic any further. If there is any chance that they end up in the core spec or as an extension, then @jgm or @codinghorror will create a new topic when the time is right

My vote is to include IHRs in the core spec because they are simple, useful, and can be implemented in a backwards-compatible way.

chrisalley · January 9, 2015, 7:04pm

Yes.

One exception to this is if the header is being linked to from outside of the document. For example if I wish to refer to a section of another article that I wrote.

an3ss · January 10, 2015, 2:51pm

Ok, if the header is linked from outside, then an implicit header reference will not help and you certainly need an ID.

jonathanKingston · February 22, 2015, 8:03pm

@codinghorror and @jgm any chance of a resolve on this?

As @tabatkins mentioned automatic link generation is problematic #

As I mentioned the CommonMark specification already has a syntax using (@id-name) to generate id’s which could be used as anchors #

So it could be that automatic generation is a default unless an override is given? However I would prefer the @ syntax to be permitted also

zzzzBov · February 24, 2015, 4:31pm

Lets drop this one

I know this is probably going to be an unpopular opinion, but the more I think about this issue, the more it strikes me as something wholly external to markdown.

Would it be convenient if markdown automatically inserted implicit IDs into headers? sure, but it runs the risk of conflicting with whatever external environment the generated markup is being added into.

Instead of trying to incorporate this feature into markdown where it might cause more bugs and make authoring more complicated, let’s just drop this feature request.

Of course, I’m not saying we shouldn’t autogenerate [id] attributes. On the contrary, autogenerate IDs as a post-process step.

That way it can be customized to suit the needs of the environment. That way we don’t have to try to add options to all markdown processors that are only meant for ID resolution.

Implementing a post-processor for this is trivially easy. Throw the rendered markup into a DOM parser, grab all the h* elements, and add ID attributes based on whatever ID resolution scheme suits your needs.

# Lorem ipsum
## Dolor sit amet

could produce

<h1 id="lorem-ipsum">Lorem ipsum</h1>
<h2 id="dolor-sit-amet">Dolor sit amet</h2>

if your resolution scheme is content based (which I assume will be the most popular)

or it could produce

<h1 id="1">Lorem ipsum</h1>
<h2 id="1.1">Dolor sit amet</h2>

if your resolution scheme is based on hierarchy.

or it could produce…whatever you want because it’s completely customizable.

In either case, the post-process step can easily be customized and easily debugged because it’s a smaller, separate entity to markdown.

matmuchrapna · June 25, 2015, 11:42am

I thinks we all need more posibillites to extend commonmark, so I create an proposal where I explain why, how and which outcomes we will have as community Flexibility and modularity for the win! and also extensions’ ecosystem

Crissov · June 25, 2015, 3:33pm

Instead of using counters

<h3 id="scene-1-1">Scene 1</h3>

a writer could incorporate upper level header content if necessary (ignore escaping for now)

<h3 id="scene-1-@-episode-1">Scene 1</h3>
<h3 id="scene-1-(episode-1)">Scene 1</h3>

However, this should really remain implementation-dependent unless implicit header references (Pandoc-style or what @an3ss proposed) were becoming a standardized extension.

zwol · June 25, 2015, 4:00pm

I said this back in December and everyone ignored it:

What’s important to me is moving toward a world where I can have confidence that every header within an arbitrarily chosen HTML document will have a fragment identifier.

Therefore, this is a feature request for mandatory generation of fragment identifiers for all headers that don’t have one explicitly specified.

I reemphasize that I want generation of fragment identifiers to be required by the core specification. Anything less does not move us toward a world where every header in every HTML document has a fragment identifier.

lu_zero · June 26, 2015, 1:16pm

The ast should provide enough information and then the application should be able to override the generation template, IMHO.

chrisalley · June 26, 2015, 2:04pm

Something like this would go a long way in helping to avoid duplicate IDs. The anchors could become quite long though; imagine the length of an H6 anchor.

Crissov · June 27, 2015, 8:23pm

Hopefully, it would only need its preceding h5 to be disambiguated.

chrisalley · June 28, 2015, 12:23am

In practice that seems likely. In theory, h5-h6 pairs could be identical.

This feature should probably be an extension. Sites could then use either explicit IDs that are guaranteed not to change (but add clutter to the document) or implicit IDs knowing that there is a risk that the headings may be reordered, breaking links. The length of the document (and likelihood of duplicate headings) could be a deciding factor.

zwol · June 28, 2015, 12:49am

This feature should probably be an extension

What part of “this feature must be in the core as mandatory to implement, because nothing less than that moves us toward a world where every heading in every HTML document has a fragment identifier” is unclear? Are my posts not actually getting through or something?

chrisalley · June 28, 2015, 12:59am

It’s clear what you’re requesting @zwol . What’s not clear is why every heading needs to have an ID.

zwol · June 28, 2015, 1:19am

What’s not clear is why every heading needs to have an ID

So that it is always possible to link to a specific section of a document. Why else? (Yes, I regularly encounter cases where I can’t write an appropriate hyperlink because the section header I need to point at doesn’t have an ID.)

chrisalley · June 28, 2015, 3:31am

The author may not wish to allow headings to be linked to. For example, the headings may be subject to change in the future (we discussed reordering headings above), so a link to the overall document may be preferred. This is why an extension may be more appropriate (with a predictable method of generating the IDs for the implementations that adopt the extension).

Crissov · June 28, 2015, 1:32pm

For what it’s worth, I agree with @zwol that every heading (including captions and maybe every reference) should automatically become a link target. Since neither Xlink nor Selectors can be expected to be used in general, explicit IDs remain the only viable solution. If Commonmark was (to become) a modular specification where everything but the core was optional to implement (i.e. an extension), there should be a module for implicit header references which also included the requirement for automatically generated IDs and probably a way for authors to set an explicit value.

Such internal links, which include an automatically generated TOC, would be the major use of automatic heading identifiers, I assume. If IDs were generated from textual content (or arbitrary/random), they are kept in synch automatically when the author rearranges the document structure, except perhaps for headings with canonically equal content. If IDs were generated by hierarchic position, on the other hand, they would be safe against subsequent textual changes. I don’t see how we could get both, but I prefer “speaking” names for all parts of an URL.

I fail to see @chrisalley’s latest point since (external) links with hash target will also work if that target is not found, i.e. the reader gets directed to the top of the whole document. (There are possible scenarios where the reader would see an unintended section instead.)
I also consider it not very important for an author or reader to be able to predict the exact ID of a heading (by either position or content) by applying some canonization algorithm mentally.

There is one way we could deal with internal links by structure rather than name: symbolic relative links, but I should probably open a separate thread for that.

# Top chapter
## First section
## Previous section
## Current section
Chainable relative links for simple siblings and ancestors:
* [Current][@]
* [Top or Upper][^] – cf. [^footnote]
* [Previous][<]
* [Next][>]
* [First][|<]
* [Last][>|]
* [Document or Top][.] – almost as in a POSIX file system
## Next section
## Last section
Implicit links like [top chapter] work always, 
whereas the following explicit ones only work in certain implementations:
* [hierarchic][#heading1]
* [hierarchic][#chapter1]
* [textual][#top chapter]
* [textual][#top-chapter]
* [textual][#top_chapter]
* [textual][#top%20chapter]
* [textual][#topchapter]
* [textual][#explicit override ID]

  [#Top chapter]: explicit override ID

chrisalley · June 29, 2015, 9:38am

[quote=“Crissov, post:58, topic:115”]
I fail to see @chrisalley’s latest point since (external) links with hash target will also work if that target is not found, i.e. the reader gets directed to the top of the whole document. (There are possible scenarios where the reader would see an unintended section instead.)[/quote]

As you said, the reader may see an unintended section in some scenarios. That was essentially my point. By making the implicit header IDs opt-in, the developer first has to make a decision as to whether this is acceptable behaviour. If it is not considered acceptable behaviour, the developer can choose the explicit header ID extension instead.

Crissov · June 29, 2015, 12:06pm

I consider these scenarios as too unlikely to counter the benefits of linkable headings in general. It makes another argument in favor of name-based IDs, though, because these are more likely to be unique over time than simple hierarchy-based ones.