Multiple matching link definitions makes no sense

In example 108, the foo link is described twice. If people are going through the trouble of making something that can be declared as semantically verifiable, why allow a confusion construct?

If the intent is to just allow me to do:

[link]: url_a

Go check out [link].

[link]: url_b

go check out [link].

such that the first one is a link to url_a and the second one is a link to url_b, the current specification wouldn’t be sufficient.

For the sake of sanity, it would just make more sense to disallow redeclaring link targets.

+++ eanderson [Sep 04 14 00:35 ]:

eanderson [1]eanderson
September 3

In example 108, the foo link is described twice. If people are going
through the trouble of making something that can be declared as
semantically verifiable, why allow a confusion construct?

If the intent is to just allow me to do:

Go check out link.

go check out link.

such that the first one is a link to url_a and the second one is a link
to url_b, the current specification wouldn’t be sufficient.

For the sake of sanity, it would just make more sense to disallow
redeclaring link targets.

The tradition in markdown parsers is not to raise errors.
No matter how crazy the input, something is done with it.
There is no such thing as an invalid markdown document.

So we need to say what to do when a duplicate link definition
is given.

Since Markdown doesn’t exactly care where a link definition is made (usually, it’s placed at the end of the document), both links would go to url_a.

It would probably be a good idea to highlight this somewhere in the spec to make it clear that all link definitions are collected independently of their location.

If nothing else, if this is desired, putting a test case in place with this example would help ensure people don’t get creative with their parsers.

3 Likes

It may be a big change, but could I argue in support for duplicate links being supported? That is,

Go check out [link].

[link]: url_a

go check out [link].

[link]: url_b

would produce two unique links.

The rationale is that Markdown documents should be composable. That is, if I concatenate two Markdown documents and compile them, it would be ideal if the output was roughly similar to just concatenating the output. (Mathematically, markdown(a) + "\n\n" + markdown(b) ~= markdown(a + "\n\n" + b)). This is ideal, because when copying and pasting various documents together, it would be ideal to not need to make global changes to the document. The current spec means that the links in the second document will be broken in the presence of duplicate label names.

An additional rationale is that allowing duplicate labels allows users to have short, “anonymous” labels for links, such as “1”, “2”, etc. This prevents users from having to either: (1) always use inline links or; (2) always invent globally unique labels for their links.

The [foo][1] is rather [bar][2].

[1]: http://example.com/1
[2]: http://example.com/2

...

The [baz][1] is still [zapped][2].

[1]: http://example.com/3
[2]: http://example.com/4

This proposal will introduce some challenges, such as “in the fact of duplicates, which link should be attached to which label?”. I don’t think there is a fully composable solution to this. I humbly suggest the following algorithm, however:

  1. A link uses the first label defined after it appears.
  2. If no label appears after a link, the link uses the closest label defined above.

For example, the following markdown:

[a]

[a]: http://www.x.com/

[a]

[a]: http://www.y.com/

[a]

using this spec should compile to:

<p><a href="http://www.x.com/">a</a></p>
<p><a href="http://www.x.com/">a</a></p>
<p><a href="http://www.y.com/">a</a></p>

The rationale for this algorithm is that in my survey of Markdown documents, people tend to define labels after the corresponding link. Rule (2) provides backwards compatibility for documents that define links before, and don’t currently use duplicates.

Thoughts on this proposal would be appreciated.

6 Likes

One problem with this proposal is that the document

Go check out [link].

[link]: url_a

go check out [link].

[link]: url_b

could be composed either of a document consisting of lines 1-4 and a document consisting of lines 4-7, or of a document consisting of lines 1-6 and a document consisting of line 7. Having the first link link to url_a and the second to url_b would preserve the original meanings of the component documents in the first scenario, but not in the second.

Because there’s no unique decomposition of a document into parts, I don’t think composability should be a goal. (And this is not the only thing that poses problems for composability in commonmark: imagine concatenating an ordered list with a 4-space indented subparagraph with a document consisting of an indented code block, for example; the code block would become a subparagraph of the list item.)

Still, the algorithm you propose is a potential alternative to the one we use. It would require storing references together with the line where they are defined, and using a slightly more complex lookup rule, but it wouldn’t be that hard to modify existing parsers to use this rule. So it would be interesting to see what people think.

Note that pandoc has a --file-scope option which can be used to operate on a set of input files, interpreting each independently instead of concatenating them. This is one approach one could use if one doesn’t want a global namespace for references. Pandoc also issues warnings for duplicate references, which is a good idea I think.

I’d thought about this, and figured the “document consisting of line 7” case was unlikely (as davidg noted, people rarely put link definitions before a link’s usage), but, thinking about it a little more, I think this is a more-plausible failure case involving link reuse to demonstrate when this can cause a problem:

We'll be having a neighborhood pool party at [Fun Times Water Park][1]
this weekend.

[1]: http://funtime.example

We think everybody will have a good time.

For directions on how to get to the park, see [their website][1].

---

Due to complaints about the noise, the township has been forced to close
the [Nervous Cow Slaughterhaus][1]. Residents will need to go elsewhere
to get their loose sacks of bloody, raw beef.

[1]: http://slaughterhaus.example

That said, I would still argue that this is better than the alternative - the first document still contains one correct link, and the second document isn’t overridden by the first.

Also, in situations where this kind of reused-link-name-collision is a possibility, the first document could avoid this issue by explicitly reiterating the definition for any symbols that were reused again at the end of the document, to create a lightweight firewall for its definition scope.

This is a case where the new parser rule would also help the human behavior: I don’t usually look upward to find a link’s definition (in the example I gave, my eyes scan the second link as pointing to the second definition, too), and explicitly saying “next following definition wins” would help reinforce “always have your definition somewhere after its usage” as a best practice.

It seems to me that this would be avoided so long as the application combining documents inserts a non-indented separating element (like a heading or ---) between each concatenated context.