Spec/Algorithm Error in Links within Links within Images?

jackdw · August 1, 2020, 5:40pm

I am close to having a parser that is ~99% common mark and GFM compliant, but I am hitting one case that is giving me trouble.

Example 516:

![[[foo](uri1)](uri2)](uri3)

Based on the algorithm at the bottom of the specification:

If we have a link (and not an image), we also set all  `[`  delimiters before the opening delimiter to  *inactive* . (This will prevent us from getting links within links.)

Based on my implementation of that algorithm, [foo](uri1) is parsed normally. As part of that parsing, following that above guidance, the second [ in that example is then marked as inactive. Therefore, when the next ‘]’ is hit, that skips over the “just marked” ‘[’ and gets pairs with the '![`. After that processing, the HTML produced becomes:

<p><img src="uri2" alt="foo" />](uri3)</p>

Is there something wrong with the algorithm?

Note, it looks like this was reported in The behaviour of alt text in image links is confusing but there are no comments about it.

jgm · August 1, 2020, 9:39pm

I see your confusion, and it probably means that the quick write-up of the algorithm isn’t complete. But, looking at the code of my parsers (e.g. in cmark, src/inlines.c handle_close_bracket) , here’s what is intended to happen:

Once [foo](uri1) is recognized as a link, we mark the [ at position 3 as “inactive.”

The next close bracket we come to is at position 15. When we encounter this, we look for a preceding open bracket in the stack and find the one at 3. We then test to see if this is active. Since it is not, we pop it off the stack and parse it as a textual [. But then, we do not keep scanning for earlier open brackets (as your algorithm does). Instead, we move on to the next close bracket at position 22.

The key point here is that marking an opening [ as inactive is not the same as removing it from the stack entirely. An inactive [ can still match a ], and only once it matches is it removed from the stack.

[EDIT:] Here’s some debugging output from cmark that illustrates this:

Found closer at 8
Found matching opener at 4
Deactivating opener at 3
Found closer at 15
Popping inactive opener at 3
Found closer at 22
Found matching opener at 2

Hope that clears things up. If you want to suggest a rewording of the algorithm description, I’d definitely be open to a PR!