+++ Vitaly Puzrin [Jan 10 15 04:44 ]:
“Right” url value depends not on standard, but on how it was received and what user wish. We can’t guess with 100% probability. This is like answering to question “should line break become hard break or not” - depends on context.
Well, I think the probability is near 0 that if they include a valid entity, they didn’t mean it as an entity. Is there even a possible interpretation of
/fooöbar
as a query string where & serves to join two separate query parameter assignments?
It seems pretty clear that if someone writes
[foo](/url/fooöbar)
they mean to write an o-umlaut, and if they write
[foo](/url/foo=5ö=bar)
where & can’t be interpreted as part of an entity, they mean to write a query with two parts. And cmark gets this exactly right, rendering the first as
<p><a href="/url/foo%C3%B6bar">foo</a></p>
and the second as
<p><a href="/url/foo=5&ouml=bar">foo</a></p>
It seems a safe principle that if you have a valid entity, the &
in it is not the &
that connects separate query parameter assignments.
IMHO, user can place url to text editor in several ways:
- type manually - he types as he see, without any encoding and replacements.
- copy-paste from browser address.
- copy via context menu (“copy this link”).
- select page content and copy-paste as part of text.
- copy-paste from page html source.
(1-4) should not process entities, (5) should process.
All of these should work well with the current cmark. Suppose the browser bar says:
http://example.com/foo=bar&baz=3
The user can copy this into a CommonMark link, and it will be rendered in HTML as
http://example.com/foo=bar&baz=3
just as needed. On the other hand, if they copy from HTML source, they’ll copy
http://example.com/foo=bar&baz=3
and CommonMark will render this just as it is,
http://example.com/foo=bar&baz=3
Again, that’s what is wanted.
The most simple solution is to remove appropriate tests. Or, as i suggested somewhere earlier, make such tests not mandatory (status = recommentation instead of requirement).
If we want an unambiguous spec, we still need to specify how things are parsed. (We can leave some latitude in rendering.) That means saying something to determine when a &
in the destination of a CommonMark link should be interpreted as part of an entity and when it should be interpreted as a literal &
. We could leave out tests that reveal these decisions, but that would just mean we didn’t have good test coverage.