Anchors in markdown


#14

I’m very sad to see this thread hasn’t moved in 2 years. I feel this is an absolute essential core element of markdown that should supersede all other consideration right now. Markdown simply cannot be used in any serious document format without the ability to set and reference anchors for use table of contents, chapter indexes, bibliographic and footnote references… you name it. Every single document format ever created, from richtext to word, hlp to chm, from epub to mobi, and 99 other ebook and document formats, all have an anchor naming and linking system in place.

Markdown is the absolute singular contender yet to enter the arena.


#15

Anchors for headers have been discussed quite extensively in the Automatically generated IDs for headers topic.


#16

Thanks for the link! Btw, I’m not sure that headers are the only place for anchors.


#17

tldr; There is absolute consensus that manual header id assignment is an important feature for CommonMark and its absence is a weakness of the spec. There is general consensus that the syntax that best fits this use case is a {#anchor-id} following the definition of the header. All the contentious issues that have held it up do not actually apply to manual header id generation. How can this move forward?

Consensus syntax:

# Header example {#Anchor-id}

To create

<h1 id="Anchor-id"> Header example </h1>  

Estimates of consensus:

This thread

Of 16 posts on this thread 11 speak positively of this syntax or the importance of a solution. 4 of the remaining do not address the merits of the syntax or the solution. 1 states that the problem is likely to have no perfect solution.

Automatic id generation thread

Of 84 posts on the Automatically generated IDs for headers topic, 15 speak positively of the {# } syntax & of the importance of having a solution.

There are three other syntax styles proposed, which have 4, 2 and 1 post in favour of each of them which have 2, 1, and 1 proponent (respectively). Of posts in which other syntaxes are proposed, these all included mention of the more general attribute assignment problem. Of those individuals favouring other syntaxes, 1 of the 2 proponents of the 4 post syntax later agreed that the {#anchor} syntax is better, on the grounds that it is already present in pandoc.

The rest of the posts (62) do not explicitly discuss the syntax or need for manual anchor ids focusing on the main issue of that other thread (how to automatically generate ids).

Conclusion: Consensus has been reached, progress has been arrested by scope creep

Having analyzed this thread as well as the discussion at Automatically generated IDs for headers topic, it’s seems that the correct next move is to proceed toward implementing the {# } for manually specifying ids for atx headers.

The discussion began 2 years ago, and it should not need to continue longer before this feature is included in the spec.

There universal consensus that allowing manual ids on headers is an important feature and a assured improvement to the CommonMark spec.

There is widespread (but not universal) consensus on the use of the {# } syntax for manual header id assignment. Of posts that comment on syntaxes, 79% support this syntax, over 6× the level of support for any other syntax.

Addressing Dissent: Manual header ids are not objects of dissent

All of dissent in these threads seems to revolve around whether this should be a more general way of assigning attributes and whether there should be auto-generated header ids.

These concerns don’t need to block progress manually specified header ids. This approach to including header ids leaves open the possibility of autogenerating them (but says nothing about autogeneration one way or the other). Additionally, it allows for other syntaxes (as well as an expansion of the same syntax) as a means of assigning attributes to headers.

Steps forward?

@jgm and @codinghorror, what are the next steps needed to see progress on this? Happy to put in the effort wherever it is needed.


#18

I agree that this is a good syntax. It’s already widely supported (e.g. in pandoc). I think the main questions are:

  1. Since this is really an extension, should it wait til we’ve got the existing core nailed down, or should we just plow ahead?

  2. Should this be thought of as a special case of a more general attribute specifier? In pandoc you can have {#identifier .class .other-class key="value"}. Of course we could also support the simple identifier form for now and leave the others for later.


#19

Plow ahead with this being the first extension. It’s been two years and the distraction could prove to be a refreshing break from dealing with edge cases in the core. Plus lots of people have been asking for a way to add anchors, and manual header ID generation is relatively simple (compared to, say, tables). Automatic header ID generation could be considered later.

It would make sense to group these together. #id for IDs and .class for classes is intuitive. key="value" could be key: "value" or key: value and look a bit less “programmerish” so it’s less obvious to me what the best syntax is here.


#20

There are lots of subtle syntax variants that most people would also accept, but which may score better at compatibility. Several ones of them could be supported, others forbidden. Some alternatives play better with info strings of fenced code blocks, others with current or proposed link syntax.

Meta data inside curly braces

  1. ## Heading {#ID .class}

  2. ## Heading ## {#ID .class}

  3. ## Heading {#ID .class} ##

  4. ## {#ID .class} Heading

  5. {#ID .class} ## Heading

  6. {#ID .class}
    ## Heading

  7. {#ID .class}
    ## Heading ##

  8. ## Heading
    {#ID .class}

  9. ## Heading ##
    {#ID .class}

  10. Heading {#ID .class}
    -------

  11. {#ID .class} Heading
    -------

  12. Heading
    ------- {#ID .class}

  13. Heading
    {#ID .class} -------

Meta data (only) separated by line affix

  1. ## Heading ## #ID .class

  2. #ID .class ## Heading ##

  3. #ID .class ## Heading

  4. #ID ## Heading ## .class

  5. .class ## Heading ## #ID

  6. Heading
    ------- #ID .class

Explicit IDs by reusing link (definition) syntax

  1. [ID]
    ## Heading

  2. [ID]:
    ## Heading

  3. ## Heading [][ID]

  4. ## [][ID] Heading

  5. ## [Heading][ID]

  6. ## Heading
    [][ID]

  7. [][ID]
    ## Heading

  8. ## Heading …
    [Heading]: ID

  9. ## Heading …
    [Heading]: #ID

  10. ## Heading …
    [#Heading]: ID

  11. ## Heading …
    [#Heading]: #ID

  12. ## Heading …
    [Heading]: [ID]

  13. ## [Heading] …
    [Heading]: ID

  14. ## [Heading] …
    [Heading]: #ID

  15. ## [Heading] …
    [#Heading]: ID

  16. ## [Heading] …
    [#Heading]: #ID

  17. ## [Heading] …
    [Heading]: [ID]

I’m probably forgetting some possibilities and proposals.


Info strings elsewhere
Info strings for suffixed headings
#21

It would be nice to see progress with autogenerated ancors, but with security considerations in mind.

As i explained earlier, it’s not safe to generate ID-s/name-s without prefixes (when value can become equal to window.<anything> in browser). And it would be very inconvenient for developers if such problem will be ignored in spec.


#22

Perhaps the spec could include a default prefix, e.g. # My Header {#id-of-header} becomes <h1 id="commonmark-id-of-header">My Header</h1>.


#23

That’s completely different thing. Manual direct access to id/classes/attrs manipulation is unsafe almost as html use. And it should be disabled for unsafe input if you don’t wish to use sanitizers.

Here i speak only about autogenerated header ids, this use case is specific.


#24

If my point about scopes is right, then this is the responsibility of the embedding scope to address.


#25

Very similar question: Feature request: automatically generated ids for headers


#26

That’s moving problem from one place to another (and more difficult) instead of resolution.


#27

But as I explained in my above linked comment, it’s moving the problem to the right place. For example, pre-HTML5, there should only be one H1 on a page. But the Markdown spec, which I believe we all agree should be portable and not tightly coupled to HTML, doesn’t and shouldn’t concern itself with possible collisions between a level one Markdown heading and an H1 in the embedding context. It’s the responsibility of the embedding context (e.g. this discourse page) to demote the Markdown headings if it wanted to implement the “only one H1” rule. It’s actually far more complex to try and solve this problem for every possible downstream context, both those that exist and ones that haven’t been invented yet. It’s far more complex to solve it in the wrong place.


#28

That’s subjective personal opinion. From my point of view, this place is not right :). Because implementation will be much more difficult. At least, from your posts, i don’t see that you are familiar with implementations and know easy way to sanitize inputs.


#29

Hi,

Is this working?

I can only say that from a users point of view this is generating a smorgasboard of dialects when just adding a piece of text like {#get-back-here} would be sufficient.


#30

Hi,
I see a very simple easy to use implementation of anchors in Commonmark spec:

A [whitespace character](@) is a space …

…A [non-whitespace character](@) is any character that is not a [whitespace character].

Will that not be suitable? It is very simple, clear…


#31

I actually like Pal_Petho’s general idea - though it should match the style of the already-agreed on manual header IDs, and shouldn’t delay implementation of that.


#32

Questionnaire

The topic is complex and there are a lot of options. I have tried to condense the principles behind them into a set of questions. This is not intended as a deciding vote but for finding out the collective opinion.

If existing Commonmark constructs are used to generate target anchors in the output format, these are known as automatic anchors or implicit anchors. A new construct or convention would be needed for manual anchors or explicit anchors.

Automatic anchors

  • All headings should become anchors automatically (using their textual content)
  • Implicit anchors (e.g. ## Heading) should automatically be available as reference link definition labels for overrides (e.g. [Heading]: {#ID})
  • Unused reference link definitions ([label]:) should become anchors automatically

0 voters

  • All links should become anchors automatically (using their textual content)
  • Specific inline links (e.g. [text](@) or [text]()) should become anchors automatically
  • All reference links ([text][label]) should become anchors automatically (using their label)
  • Specific reference links (e.g. [text][#label] or [][label]) should become anchors automatically
  • Other links should not become anchors automatically

0 voters

Manual anchor restrictions

Manual anchors …

  • may be restricted to headings
  • may be restricted to defining terms (e.g. <dfn> in HTML output)
  • may be restricted to blocks (i.e. headings, code blocks, quotations, …)
  • may be restricted to headings and defining terms
  • may be restricted to blocks and defining terms
  • should be available in arbitrary locations

0 voters

Manual anchor positions

Manual anchors …

  • should always come before/above text (e.g. ## Heading {#ID} ##)
  • should always come after/below text (e.g. ## {#ID} Heading ##)
  • may come either before/above or after/below text

0 voters

Manual anchors in headings

  • should come between text and affix ## or underline === (e.g. ## Heading {#ID} ##)
  • should stay outside text and affix ## or underline === (e.g. ## Heading ## {#ID})

0 voters

Manual anchors in links

  • should be inside the text part (e.g. [text {#ID}](target))
  • should be inside the target or label part (e.g. [text](target {#ID}))
  • should be outside current text and target or label parts (e.g. [text](target){#ID})

0 voters

Stylistic preferences

  • Anchor ID should always be inside curly braces {}
  • Anchor ID should always be prefixed by a hash sign #
  • Manual anchors should always be on a separate line

0 voters


#33

Reminder: Anchors in markdown

I’d be happy to have automated anchors, but it will be a big ass pain if such things appear in spec without security considerations. Also, manual ID-s are not convenient (IMHO) as primary solution and may tend users to make security mistakes.