Hi all – In the spec, why is “Unicode whitespace character” a superset of “Whitespace character”? It’s counterintuitive that a more specific term is a superset of a broader term.
In the spec’s section 2.1, “Whitespace character” is defined as a space, tab, newline, line tab, form feed, or carriage return.
In the same section, “Unicode whitespace character” is defined as “any code point in the Unicode
Zs general category, or a tab (
U+0009 ), carriage return (
U+000D ), newline (
U+000A ), or form feed (
Do we need two different categories of whitespace character? Would it be better to just give “Whitespace character” the definition currently used for “Unicode whitespace character”?
Why would “Whitespace character” be a subset of “Unicode whitespace character”?
There’s an error in the definition of Unicode whitespace character: line tab is missing (
U+000B). It’s not in the list and it’s not in the Unicode Zs category. (You might think space is also missing, because it’s not listed, but it’s in the Zs category.)