UTF-8 Encoding recommendation

While I can understand that the specification does not explicitly cover the encoding issue, it would be nice if it at least could make a strong recommendation for a specific encoding. Let’s say UTF-8. This would bring the issue to the mind of developers and encourage them to produce something that could be used by non western people. It would probably also address the RTL problem.

A standard conformance implementation should be able to AT LEAST handle UTF-8 and assume UTF-8 input unless specified differently.

There is so much time and energy wasted by not specifying it, let’s do this right this time.


I just noticed that non western test cases should also be added to the test suite. As I am not fluid in any language that has non-western characters, does anyone would want to contribute them?

Agree. All new web specs mandate utf-8 (unless there are legacy constraints forcing them to accept other encodings, in which case they merely recommend utf-8; see CSS for an example (look at the note starting with “Though UTF-8…”)).

Due to legacy constraints, CommonMD probably also can’t mandate utf-8, but should at least strongly recommend it.

I didn’t see this in the spec yet, is it still on the radar? Sounds like a good idea to me…