2008-02-06

Justice at last, part two

The Fifth Edition of XML 1.0 is now a Proposed Edited Recommendation.

So what, you say. Ho hum, you say. A bunch of errata folded in to a new edition, you say. No real change here, you say.

But no, not at all, but quite otherwise. There's a big change here, assuming this PER gets past the W3C membership vote and becomes a full W3C Recommendation. There's something happening here, and what it is is eminently clear.

Justice is coming at last to XML 1.0.

For a long time, the characters used in the markup of an XML document -- element names, attribute names, processing instruction targets, and so on -- have been limited to those that were allowed in Unicode 2.0, which was issued in July 1996. If you wanted your element names in English, or French, or Arabic, or Hindi, or Mandarin Chinese, all was good. But if you wanted them in the national languages of Sri Lanka, or Eritrea, or Cambodia, or in Cantonese Chinese, to say nothing of lots and lots of minority languages, you were simply out of luck -- forever.

Not fair, people.

I tried fixing this the right way, by pushing the XML Core WG of the W3C to issue XML 1.1. It acquired some additional cruft along the way, some good, some in hindsight bad. It was roundly booed and even more roundly ignored. In particular, at least one 800-pound gorilla voted against it at W3C and refused to implement it.

Now it's being done the wrong way. We are simply extending the set of legal name characters to almost every Unicode character, relying on document authors and schema authors not to be idiots about it. Is that an incompatible change to XML 1.0 well-formedness? Hell yes. Is any existing XML 1.0 document going to become not well-formed? Hell no. We learned our lesson on that one.

Who supports this? I won't name names, but XML parser authors and distributors from gorillas to gibbons have been consulted in advance this time, and there are no screaming objections. Some will probably provide an option to turn Fifth Edition support on, others will turn it on by default. Unlike XML 1.1 support, this is actually a simplification: the big table of legal characters in Appendix B just isn't needed any more.

"Hot diggity (or however you say that in Amharic). When can I start using this?" Not so fast. First the W3C has to vote it in -- if they don't, all bets are off. Then implementations have to spread through the XML ecosystem, including not only development but deployment. It'll take years. But it only has to be done once, for all the writing systems that aren't in Unicode yet will all Just Work when they do get implemented.

Ask not what you can do for XML, but what XML can do for you.

It's morning in the world.

(Oh yes: Send comments before 16 May 2008 to xml-editor@w3.org.)

No comments: