Feature request: automatically generated ids for headers


#81

If the link is written as @an3ss suggests:

## The Philosophy of CommonMark

You need to understand [the philosophy of CommonMark] because blah, blah, blah...

The reference and the anchor are synchronized through the text. If you change the text of the header, you most likely want to change the ID as well, unless you just use UUIDs. I don’t think this is a very compelling argument for requiring an explicit ID, but if it is what it takes to achieve consensus, I can live with that.


#82

The reference and the anchor are synchronized through the text.

Not necessarily; it’s very conceivable you could use text other than the header text to refer to a section. This also doesn’t take into account links from external documents.


#83

I suppose it is nicer for the author not to have to manually write the ID of a heading in order to link to it. I am concerned that adding square bracket heading links to the core spec would break a lot of existing Markdown documents (imagine how many GitHub README.md files would use square brackets for some other purpose and shouldn’t link to a heading), but as an opt-in extension it could be useful. Or some other syntax could be used besides square brackets.

How that would get around the reordering issue without the author defining explicit IDs? Can you provide an example?


#84

To be honest, I’m not sure what I was thinking of, but I think it was something like this: If the generated ID consists of both the position of the header in the outline as well as the header’s name, reordering of headers shouldn’t be a problem. For the following outline:

# Level One
## Level Two
# Level One
## Level Two

The generated ID’s of the headers could be something like:

  1. section1-level-one
  2. section1.1-level-two
  3. section2-level-one
  4. section2.1-level-two

If you reorder or rename a header, it will get a new ID. You won’t have backward compatibility with incoming links, but you will avoid conflicts.

To the argument of being backwards compatible and conserving incoming links, I think that’s impossible unless your ID’s have absolutely nothing to do with the document structure at all; i.e. semantically nonsense. UUID’s will give you this detachment if you really want it and for those who do, they should by all means be able to explicitly name the ID of their headers and stuff an UUID in there, but for those of us who like semantically accurate and human intelligible ID’s, we can go with a more attached and brittle autogenerated ID.


#85

I think no one has noticed this issue, it affects users from other languages (english is fine with this). Most implementation of automatic IDs have the lame effect of ignoring accented characters, which is correct actually, since the markdown might be incorrect and cause issues if the accent was included (the URL can’t contain accents), but the issue is that it’s not properly converted to the non-accented counterpart, it’s just ommited.

For example, for the title:

# “Techné” as the greek word for “Art”

The id would be techn-as-the-greek-word-for-art, instead of the correct way: techne-as-the...

Most automatic ID generators commit this error and I don’t think they’re to blame, I just think that language is really complicated and even I have no idea what issues this might be for other languages such as Japanese or Corean.

Markdown should not behave in an opinionated way (which is inevitable with automatic IDs), unless it clearly provides an alternative to use your own criteria to generate ids.

I really like the {#id-goes-here} approach because it’s clearly understandable and for reasons stated above.


#86

We could define some rules to automatically convert the commonly used accented characters. But you might be right about it being difficult to anticipate the correct behaviour for all languages. This is a compelling reason to include an override method as part of the extension.


#87

I’m kinda in favor of the simplest solution, the one that GitHub has adopted, even at the risk of collisions (though GitHub avoids collisions by appending a suffix when a collision is detected).

The thing to remember is that links should be easy to author, just as with the rest of markdown/commonmark.

I haven’t thought through the implications enough, hence the “kinda”.


#88

Babelmark shows just how different approaches are for spaces and roman non-ASCII letters.

With an info string, authors could override automatic IDs.


#89

We don’t need namespaces. We already have scopes.

Within the scope of the content I’m authoring, [me too](#me-too) can unambiguously link to # Me Too. I as an author should not have to think about any containing scopes. This is perfectly analogous to block scopes in most programming languages.

It is the responsibility of the embedding context to respect and protect my scope. Whether in its rendering it avoids ID collisions by altering its IDs or mine, or demotes my heading levels to avoid multiple H1s, it is its business, its job to make it work.

There should be a clear separation of concerns between authoring content and publishing mechanics. I shouldn’t have to manage the technicalities of an output format while authoring in a format that is supposed to be independent and portable. As the content author, the only thing that matters is that I have a semantically unambiguous and intuitive way to create internal links. I don’t care how they are ultimately rendered.

The one case where I do care, when I want the world to be able to deep link into my published content, I make sure to choose a publishing tool that produces predictable, “exportable” header ids and deep links, perhaps one that retains the unaltered CommonMark anchors or perhaps one that doesn’t. Who knows, maybe the content will get published in a relational database, and all anchors get translated into foreign keys. These are publishing concerns, not authoring concerns, not CommonMark spec concerns.


Anchors in markdown
#90

Commonmark should (with “support” meaning either in the mandatory core or via an optional extension) …

  • require implementations to automatically generate implicit IDs for headings
  • specify how to generate implicit IDs from heading text
    (e.g. “Überschrift 1” ⇒ #uberschrift_1)
  • specify how to generate implicit IDs from document structure
    (e.g. ### Heading#section-3.6.1)
  • specify how to generate safe IDs for user-generated content
    (e.g. {#window.evil');\ DROP\ TABLE\ *;--}#user-window-evil-drop-table)
  • support manually entered, explicit IDs for headings
    (e.g. ## Heading ## #ID)
  • support manually entered, explicit IDs for any block
    (e.g. ~~~ #ID)
  • support manually entered, explicit IDs for links
    (e.g. [text](target #ID))
  • support manually entered, explicit IDs for any inline markup
    (e.g. *emphasis*{#ID})
  • support overriding implicit IDs with explicit ones
    (e.g. [heading text]: #ID)
  • support relative links to headings/sections
    (e.g. [next section][>] or [this section](#.))

0 voters