+++ Kārlis Gaņģis [Jan 18 15 15:01 ]:
I think that the origin of the current [a-z0-9] rule is because
_ is often used in member names from code. Almost always they will be ASCII, not unicode. But since the rule is there, most people would assume that
ā_ā_ā would work the same so it could perhaps changed to any non-whitespace…
Yes, the intent was to avoid capturing underscores in code identifiers as emphasis, since that’s the case I was aware of that was prompting people to complain.
I hadn’t considered this URL case.
Given the new rules for emphasis, I wonder if it would make sense to simplify the rules for
_ emphasis thus:
_ character can open emphasis iff it is part of a left-flanking delimiter run and is not preceded by an ASCII alphanumeric character.
we could have:
_ character can open emphasis iff it is part of a left-flanking delimiter run and not part of a right-flanking delimiter run.
and so on.
We’re already doing the checks for left- and right-flankingness, so this would not add any complexity to the code; it would even simplify it.
[EDIT: Just to clarify, this suggestion would not prevent emphasis in the original URL example. It just seemed independently preferable to singling out ASCII alphanumerics.]
[UPDATE: I have now made this change.]