Gap in spec: what is "an alphanumeric"?

Throughout section 6.4 on emphasis markup, the specification talks about “an alphanumeric” without ever providing a definition for this term.

Defaulting to the definition given in ISO/IEC 2382-4:1999 “Information Technology - Vocabulary”:

alphanumeric character
A character of an alphanumeric character set.

does not help – an alphanumeric character set is simply a character set that contains both letters and digits and may contain special characters. So that is obviously not what “an alphanumeric” means here.

One can then go on and guess that “an alphanumeric” means: a character that is “alphabetic” or “numeric”.

Now “alphabetic character” has a meaning, and it is synonymous with “letter”:

alphabetic character
A graphic character that, when appearing alone or combined with others, represents one or more concepts of a written language, or one or more sound elements of a spoken language.

NOTE – Diacritical marks used alone and punctuation marks are not considered to be letters.

Similarly, “numeric character” is synonymous with “digit” (but not with “decimal digit”!)

numeric character
A character that represents a natural number.

Examples: One of the characters 0 through 9 in the decimal system; these digits plus the characters A through F used in the hexadecimal system.


  1. The mathematical term “natural number” denotes all non-negative integers.
  2. This is a modified version of the definition in ISO/IEC 2382-01.

One could guess further, and assume that “letter” is equivalent to: character in the General Category L* [Letter, *] of Unicode.

So maybe “digit” is equivalent to: character in the General Category N* [Number, *]` of Unicode? (But not to ASCII digit, of course!).

On a more practical level, I’d like to ask:

  1. Is U+00C6 Æ “an alphanumeric”? Probably (it has General Category Lu [Letter, Uppercase]).

  2. Is U+00BC ¼ “an alphanumeric”? Probably not (though it has General Category No [Number, Other]).

  3. Is “an alphanumeric” equivalent to "a code point having a General Category property of L* or N*"?

  4. If so, must every conforming implementation therefore know all the code points having a General Category property of L* rsp N*? (In addition to the existing requirement of knowing all the 700-something code points having Gen. Cat. P*, for the so-called punctuation character class in the specification …)

  5. If not, what is after all “an alphanumeric”?

It doesn’t really matter, because the few references to alphanumerics in the spec are incidental. They occur only in illustrative examples and they could be replaced by something else, e.g. “non-punctuation, non-space character” or something like that.