Four clauses quoted from Section 2.1
-
A character is a unicode code point. This spec does not specify an encoding; it thinks of lines as composed of characters rather than bytes. A conforming parser may be limited to a certain encoding.
-
A line is a sequence of zero or more characters followed by a line ending or by the end of file.
-
A line ending is a newline (U+000A), carriage return (U+000D), or carriage return + newline.
-
A line containing no characters, or a line containing only spaces (U+0020) or tabs (U+0009), is called a blank line.
JD: Point 4 contradicts Point 2. If a line must include a line ending by definition, as Point 2 implies, a line cannot contain no characters. It contains at least one character – an end of line or end of file character.
It’s possible that “character” is being used differently in the opening clause of Point 4, to mean a visible text character, excluding control characters or whitespace. However, this would contradict Point 1, which defines a character as a Unicode code point. A control or whitespace character is a Unicode code point.
Suggested edit:
Clarify Point 1 by making it explicit that a character can be displayable text, whitespace, or control characters. e.g. “A character is a Unicode code point, whether text, whitespace, or control characters – any Unicode code point is a character.”
Change Point 4 to “A line that contains no characters other than a line ending is a blank line. A line that contains no characters other than spaces (U+0020) and/or horizontal tabs (U+0009) before a line ending is also a blank line. Note that a blank line must have a line ending, else it is not a line.”
Notes on the suggested edits:
- Note that I capitalized Unicode, which is the proper form.
- Note that I specified a horizontal tab in Point 4, instead of just a tab. You do have the codepoint there, so it shouldn’t confuse anyone, but since here are other tabs, like vertical tab (U+000B), I like to spell it out.
- Note I inserted “and/or” in Point 4 in place of “or”. I assume you mean and/or there – that is, I assume that a line that contained both spaces and horizontal tabs would qualify as a blank line.