Invalid Unicode Code Points

In the preface for example 312 (322 for the GFM), there is the line:

Invalid Unicode code points will be replaced by the REPLACEMENT CHARACTER ( U+FFFD ).

I have been looking for a couple of days now on what a good definition of invalid unicode code points are with little success. Is there an easy reference that I am missing somewhere? Does it depend on something installed on the system itself or is there a global check?

Please advise.

I would read it as “anything invalid or ill-formed” in the respect to the document encoding used/assumed by the implementation.

That would include for example any Codepoints larger then U+10FFFF or any ill-formed multi-byte UTF-8 sequence (when assuming UTF-8).

EDIT: For more exhaustive description, see e.g. the Unicode standard version 12, especially the chapter 3.9 about the Unicode encoding forms.

Code points greater than 0x10FFFF are invalid. (Unicode standard.)

Jack via CommonMark Discussion noreply@talk.commonmark.org
writes: