Why is "link text" not allowed to "contain other links"?

Spec 0.27 quite clearly states:

Links may not contain other links, at any level of nesting. If multiple otherwise valid link definitions appear nested inside each other, the inner-most definition is used.

However, “link text” is apparently allowed to contain “autolinks”, see babelmark2 (Pandoc is one of the very few implementations that don’t allow that).

This seems inconsistent to me, and I think it would help if the reason were mentioned in the spec. Or is it and I just didn’t find it?

If there is no reason, this rule should be removed from the spec.

For the record: Markdown.pl, Pandoc and quite a few others seem to favor the outer link and don’t parse the inner one and there are also quite a few implementations that parse both, see babelmark2.

1 Like

This is something that really needs a decision on principle.
See https://github.com/jgm/cmark/issues/193.

Perhaps allowing links inside link text across the board
would be the best solution.

1 Like

It would be unfortunate if we had to delay 1.0 over trivia like this, where simply coming down on one side or the other is all that’s needed since there is no “correct” answer.

1 Like

I’m surprised that it even allows autolinks, since nesting <a> elements is against HTML spec – browsers don’t even render it correctly. So why support it at all?

edit: don’t know… I’m probably being too pedantic. I guess it’s a good UX, since users don’t know or don’t care about HTML spec.

2 Likes

Since there seems to be no justification for the current inconsistent behavior, we might as well change it before 1.0, shall we?

In total, I see 4 options:

  1. Keep it as is. Totally inconsistent, against HTML spec.
  2. Just allow arbitrary inline nodes, including links and whatever else, no exceptions. Very simple rule to remember, simple to implement. Apparently against the HTML spec, but is this really a problem?
  3. Don’t allow “link text” to contain any links at all. If there is an inner link, the outer markup is ignored. Valid HTML.
  4. Don’t allow any links to be inside “link text”. The outer markup gets converted to a link, the inner markup is ignored. Valid HTML.

Is there another option?

For me, number 2 is the clear favorite here. Easiest rule, no exceptions, shorter spec, simpler implementation.

+++ Matthias Geier [May 06 17 10:28 ]:

mgeier [1]mgeier
May 6

Since there seems to be no justification for the current inconsistent
behavior, we might as well change it before 1.0, shall we?

In total, I see 4 options:

  1. Keep it as is. Totally inconsistent, against HTML spec.
  2. Just allow arbitrary inline nodes, including links and whatever
    else, no exceptions. Very simple rule to remember, simple to
    implement. Apparently against the HTML spec, but is this really a
    problem?
  3. Don’t allow “link text” to contain any links at all. If there is an
    inner link, the outer markup is ignored. Valid HTML.
  4. Don’t allow any links to be inside “link text”. The outer markup
    gets converted to a link, the inner markup is ignored. Valid HTML.

Is there another option?

For me, number 2 is the clear favorite here. Easiest rule, no
exceptions, shorter spec, simpler implementation.

I think I also favor #2, though I need to think through the
changes that would be needed in the current parsing strategy
in the reference implications.

Note that we could always parse the interior links as links
in the AST, but then render them as regular text in HTML.
(Of course, that would make the examples in the spec a bit
confusing.)

1 Like

I don’t think “screw validity” is a good answer, here.

When write nested HTML links, the outer link gets implicitly closed: CommonMark-flavored markdown parser.

When write nested HTML links, the outer link gets implicitly closed: <a href="https://github.com/commonmark/">CommonMark-flavored <a href="https://markdown.org">markdown</a> parser</a>.

I wouldn’t call that good behavior, at all.

link text inside link text should be allowed. It isn’t valid html though. CommonMark shouldn’t be influenced by HTML standards. If the intended target is HTML though, then the person writing the markup will have to keep that in mind.

Hello everyone. I got kinda interested in this project.

Not only aren’t anchors not allowed in browsers. the browsers also won’t render them correctly. (They push the inner link out of the outer link)

But html is made to work even while its badly formatted

Thinking about it some more, I’d not go against HTML spec – I personally have never needed to use nested links in title, so can’t say what’s would be a common use case, but I’d say the implicit-closing-of-anchors is far more UX-unfriendly than disallowing nested links to begin with.

E.g., writing

[the [aquila](url) rift](url2)

will IMO leave user more confused with the output due to the prematurely closed link, than if we just disallowed the nested link altogether, which will make it obvious what’s happening from first sight.

It’ll also teach users early on how to properly write links.

This argument obviously stands on the fact that markdown is mainly used for HTML output, where nested links don’t make sense. Is there even an output format where it makes sense to have nested links?

1 Like

If this was a new language, option (4) would make a lot of sense. The inner link could be rendered as it is written, with literal brackets being included in the output and made safe by the parser.

Most existing parsers behave closer to (2) though. CommonMark’s goal is to be highly compatible, rather than to support the best possible syntax. Other changes have been made where the chances of breaking existing documents have been low, however.

normal link text (special link text)continuing on with the normal link text.

<a href="#normal">normal link text (</a><a href="#special">special link text</a><a href="#normal">)continuing on with the normal link text.</a>

[normal link text(](#normal)[special link text](#special)[)continuing on with the normal link text.](#normal)

I’ve seen it be used in a few sites. but just breaking up the link should be easy enough. (And its not ambiguous, and it follows HTML spec. and it easier to implement)

I think a link should start with [ and end with ] pretty simple. all other methods would be ambiguous. (which was probably why html dissallowed it)

inline links are also not as easy for a human to read.

I didn’t bring this up because I want to use links inside of links. I don’t!

I think this is a strange thing to do and I don’t expect anyone to actually do this.

My point is that the current rules are inconsistent and we should make them consistent. Once they are consistent, it would also be nice if they were really simple. And just allowing arbitrary inline nodes in link descriptions is the simplest I can think of.

Now the question is what should happen if people against all odds actually use links inside of links?

Should a CommonMark parser also be an HTML validator?

Even if the spec would disallow links inside of links, a user could still write plain old HTML that’s not conformant.

I see no point in enforcing conformant HTML, but if it is desired, it should be done in the HTML renderer.

We wouldn’t really actively “support” it, it would just happen to be possible.

If the reason for disallowing it is the HTML spec, then we would, for consistency, also have to check all HTML blocks and HTML tags for HTML conformance.

And I doubt very much that this is the goal of CommonMark.

Yes, the renderer seems to be the right place to handle this.
Since AFAIU the spec doesn’t specify the renderer, we should keep such examples out of the automated tests.

Well that’s HTMLs behavior.

I wouldn’t call that good Markdown input and blame the one who wrote it.

I think in this situation there is no “good” behavior. And in absence of “good” behavior we should strive for “consistent” and “easily understandable” behavior.

I think the CommonMark spec shouldn’t be viewed as a teaching tool.

A given implementation may want to go that way, but complicating the spec because of that is IMHO not a good idea.

No idea, but let’s just not forbid it anyway.

We are not actually adding any complexity to the spec with that, we’re removing complexity!

I think rules should be simple, but for the end user. And I think the rule should create the least amount of surprise from all the alternatives.

Besides, the consistency goes out of the window anyway coz we e.g. already disallow links (or any inline rules) in image titles (alt text). Images are also a good examle of tight coupling to HTML (since support for alt and title, both of which are HTML-specific things).

As noted above, I think markdown is tightly coupled to HTML to begin with, and by trying to reverse that we’d end up with something that might look good on paper, but any commonmark implem would be pretty much useless in a web app without extensive work on top of the spec.

I mean, if we try to decouple commonmark from HTML, every implementation would have to come up with its own rules to enforce valid HTML output from the source markup. That would pretty much defeat the purpose of the commonmark – that is a standard for markdown→output

I don’t think that’s relevant, because we’re discussing markdown spec. HTML, even if allowed to be nested within markdown documents, is a separate issue.

It would be easier implement if nested links are allowed than if they were explicitly disallowed

Nested links are allowed in embedded image syntax for a reason. Though I’m not actually sure what this reason is, I assume it is figures, i.e. when the link text is not (primarily) used for the alt accessibility attribute, but for a visible caption or legend. Mediawiki, which Wikipedia ist running on, also allows links within image references for this reason: [[Image:file_name.png|thumb|Caption with link to [[Article]].]]

Markdown parsers should turn an image reference into a figure in (HTML) output if it is the only content of a paragraph, but this is discussed elsewhere, as are other media resources like audio and video.

You might think that normal, non-embedded, non-image links are not used as (floating) figures. That is wrong. Many blogging, CMS, forum and other social media software (including Discourse) turns URLs that stand alone in a paragraph or are found at the end of a message into a “card” that usually contains title, image and content previews that are automatically fetched. In a Markdown-based environment, these cards may also use the link text provided by the author, which is not available for plain URLs, as a caption which could absolutely contain markup including links.

This alone may be reason enough to allow nested links, although they should be flattened in many common scenarios.

1 Like

Babelmark 3

[text](target), [text][label], [text][], [text]
What kind of hyperlink markup should be possible inside Link Text?

  • Only Auto Links <URL>
  • Any kind of link
  • No links, inner-most link wins
  • No links, outer-most link wins

0 voters

Collapsed and Shorthand Links cannot support normal links in Link Text currently, because Link Label (which is the same then) cannot contain verbatim square brackets.

Yet again, I would like to come down on the side of whatever is easiest here, up to and including “do nothing”, in the interests of getting us to 1.0.

4 Likes

I believe that reusing reference link definitions in shorthand notation, especially for links to full-size versions of embedded images like [![text=label]], is a good argument in favor of allowing square brackets, and thus nested links, within link texts (and thereby link labels):