Feature request: automatically generated ids for headers


I unterstand. That’s an implementation memo for all. IDs are not generated now, but i think sometime they will, because it’s very useful thing.


@an3ss How would your proposal work if the writer changed the order of two headings that had the same text (e.g. both called “The Philosophy of CommonMark”)? This seems to be biggest issue for automatically generated header IDs.


Is it notmal to have headers with exactly the same content?


You might have a document with headings like this:

## Episode 1
### Scene 1
### Scene 2
## Episode 2
### Scene 1
### Scene 2


Of course, the writer has to synchronize manually changes in header titles and changes in the repeated headers’ order.

If you have two headers titled “Scene 1”, like in your example, then the following header references will be made availabe automatically (the ids are hypothetical auto-generated ids):

[Scene 1]: #id-of-scene-1-1
[Scene 1 #1]: #id-of-scene-1-1
[Scene 1 #2]: #id-of-scene-1-2

So, if you use the reference link [Scene 1 #1] (or just [Scene 1]), you will be pointing to the first “Scene 1” header that occurs in the document, and with [Scene 1 #2] you’ll be pointing to the second.

If at some point you decide to exchange the contents of the two “Scene 1” sections, then you have to take care of updating all links to them accordingly (exchange #1 and #2 in their link labels).

Note that the generated id is irrelevant with this mechanism; you are linking to the header with title “Scene 1” in the order that you specify, that’s all. The parser will take care of generating the id and using it in the generated <a> element.

On the other hand, sometimes it may be better to use explicit ids, declaring the id in the header (with whatever syntax is decided):

###Scene 1 {#they_kiss}

and defining the link reference explicitly:

[They kiss]: #they_kiss

so you can link to the scene with, for example:

in [the scene where they kiss][they kiss].

or simply:

in the scene where [they kiss].

This option is perfectly compatible with the implicit header references that I am proposing.


I think this solution is adequate for shorter documents (say, a Wikipedia page) where repeated headers are unlikely. Requiring explicit header IDs would add significant overhead for the writer. For longer documents, that likelihood increases and it becomes more difficult for the writer to keep track of which links point where. It might be wise for the writer to define explicit IDs in the case of longer documents.

So, automatic header IDs, with the option of overwriting them with explicit header IDs.

Another issue with automatic IDs is that they may clash with other IDs on the page. Imagine two posts in a forum topic having the same header text. Now, suppose the first post is deleted. The order of the IDs would change and any links to the second post would break. As a solution, the parser could accept an optional namespace parameter that would be added to the start of the ID. The ID of the header “The Philosophy of CommonMark” would become #discourse-topic-115-post-40-the-philosophy-of-commonmark, for example.

Turning empty link definitions into anchors

I agree.

I assume you’re talking here about automatic IDs in general, not implicit header references.

Let me insist, just in case: In order to use IHRs, an author does NOT need to know anything about automatic or explicit header IDs. As mentioned before, IHRs are already implemented in Pandoc. They are documented here:

That said, I feel we shouldn’t discuss IHRs under this topic any further. If there is any chance that they end up in the core spec or as an extension, then @jgm or @codinghorror will create a new topic when the time is right :wink:

My vote is to include IHRs in the core spec because they are simple, useful, and can be implemented in a backwards-compatible way.



One exception to this is if the header is being linked to from outside of the document. For example if I wish to refer to a section of another article that I wrote.


Ok, if the header is linked from outside, then an implicit header reference will not help and you certainly need an ID.


@codinghorror and @jgm any chance of a resolve on this?

As @tabatkins mentioned automatic link generation is problematic #

As I mentioned the CommonMark specification already has a syntax using (@id-name) to generate id’s which could be used as anchors #

So it could be that automatic generation is a default unless an override is given? However I would prefer the @ syntax to be permitted also


Lets drop this one

I know this is probably going to be an unpopular opinion, but the more I think about this issue, the more it strikes me as something wholly external to markdown.

Would it be convenient if markdown automatically inserted implicit IDs into headers? sure, but it runs the risk of conflicting with whatever external environment the generated markup is being added into.

Instead of trying to incorporate this feature into markdown where it might cause more bugs and make authoring more complicated, let’s just drop this feature request.

Of course, I’m not saying we shouldn’t autogenerate [id] attributes. On the contrary, autogenerate IDs as a post-process step.

That way it can be customized to suit the needs of the environment. That way we don’t have to try to add options to all markdown processors that are only meant for ID resolution.

Implementing a post-processor for this is trivially easy. Throw the rendered markup into a DOM parser, grab all the h* elements, and add ID attributes based on whatever ID resolution scheme suits your needs.

# Lorem ipsum
## Dolor sit amet

could produce

<h1 id="lorem-ipsum">Lorem ipsum</h1>
<h2 id="dolor-sit-amet">Dolor sit amet</h2>

if your resolution scheme is content based (which I assume will be the most popular)

or it could produce

<h1 id="1">Lorem ipsum</h1>
<h2 id="1.1">Dolor sit amet</h2>

if your resolution scheme is based on hierarchy.

or it could produce…whatever you want because it’s completely customizable.

In either case, the post-process step can easily be customized and easily debugged because it’s a smaller, separate entity to markdown.

Flexibility and modularity for the win! and also extensions’ ecosystem

I thinks we all need more posibillites to extend commonmark, so I create an proposal where I explain why, how and which outcomes we will have as community Flexibility and modularity for the win! and also extensions’ ecosystem


Instead of using counters

<h3 id="scene-1-1">Scene 1</h3>

a writer could incorporate upper level header content if necessary (ignore escaping for now)

<h3 id="scene-1-@-episode-1">Scene 1</h3>
<h3 id="scene-1-(episode-1)">Scene 1</h3>

However, this should really remain implementation-dependent unless implicit header references (Pandoc-style or what @an3ss proposed) were becoming a standardized extension.


I said this back in December and everyone ignored it:

What’s important to me is moving toward a world where I can have confidence that every header within an arbitrarily chosen HTML document will have a fragment identifier.

Therefore, this is a feature request for mandatory generation of fragment identifiers for all headers that don’t have one explicitly specified.

I reemphasize that I want generation of fragment identifiers to be required by the core specification. Anything less does not move us toward a world where every header in every HTML document has a fragment identifier.


The ast should provide enough information and then the application should be able to override the generation template, IMHO.


Something like this would go a long way in helping to avoid duplicate IDs. The anchors could become quite long though; imagine the length of an H6 anchor.


Hopefully, it would only need its preceding h5 to be disambiguated.


In practice that seems likely. In theory, h5-h6 pairs could be identical.

This feature should probably be an extension. Sites could then use either explicit IDs that are guaranteed not to change (but add clutter to the document) or implicit IDs knowing that there is a risk that the headings may be reordered, breaking links. The length of the document (and likelihood of duplicate headings) could be a deciding factor.


This feature should probably be an extension

What part of “this feature must be in the core as mandatory to implement, because nothing less than that moves us toward a world where every heading in every HTML document has a fragment identifier” is unclear? Are my posts not actually getting through or something?


It’s clear what you’re requesting @zwol . What’s not clear is why every heading needs to have an ID.