Allowing whitespaces in link URLs


#1

I imagine this has been discussed a bunch before, but my searching wasn’t able to find relevant threads.

I’d like to discuss whether whitespace (literally just the space character on standard keyboards) can be allowed in link URLs?

My use case is the markdown of my README on my github repo, where I had links to content in directories where the directory names had spaces. Github has always allowed this, and then when they switched to cmark, it broke my links. That is, they don’t parse as links at all, and just show up as text in the page, which is awful.

My problem with manual encoding of spaces as %20 is that on more than a few occasions I’ve fat fingered it as %29 and didn’t catch the problem until much later when the links were broken but not obviously so unless you actually tried to click on them. Hitting space is much more natural, which it seems to me fits with the general mindset of what Markdown tries to make easy for authors.

Typing a <a href="http://some.tld/link with spaces/">foo</a> into my markup seems to still work just fine, so it’s really unfortunate that [foo](http://some.tld/link with spaces/) is now broken. :frowning:

Browsers definitely still work with URLs where the space isn’t encoded, and some even still display without the encoding, like Firefox:

I’m aware it’s a long shot at this point, but I’m just curious if it’s possible to reconsider and relax this restriction? And if not, I’d appreciate more info on why not. Especially, I’d like to understand how Github was able to get away with allowing this for many years and it didn’t seem to ostensibly break anything?


#2

Just chiming in with the wider GitHub perspective here — this is one of the single biggest discrepancies our users have reported, above and beyond subtle formatting changes in general, and as in @getify’s case, breaking links causes significant pain for our users (and the perusers of their content).

From our point of view, allowing spaces in regular []() links would be a good thing.

This clearly complicates the spec inasmuch as it renders confusion with link titles, but I can count on one hand the number of times I’ve ever seen a user attempt to use a link title in CommonMark, whereas regular URLs with spaces are frequent. Further, existing link title parsing could be retained while still allowing spaces. (i.e. [a](/a b c) has a link destination only, [a](/a b c (d)) and [a](/a b c "d") have a link destination and link title.)


#3

The first space somewhere after ( should definitely terminate the URL, because (proposed) extensions go beyond link titles, especially with images, e.g. ![alt](src width).

However, URLs within angle brackets could be exempt from this, so that people who want literal space characters in URLs could write [text](<href with spaces>).


#4

Why not just use quotation marks. no one complains that you need to use “quotes” around for a link’s href.


#5

Sorry I dropped this thread. I’ve already made the change allowing
whitespace inside <..> links. I’m open to exploring allowing
whitespace in regular links. (And also in link reference definitions,
I assume?) It might be tricky to get it right without introducing
a lot of new complexity.

Yuki Izumi noreply@talk.commonmark.org writes:


#6

I believe it is sufficient to require enclosing angle brackets in order to accept literal spaces inside a link destination, but of course allow that anywhere one can occur.