ICANN ICANN Email List Archives

[At-Large Advisory Committee]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [alac] FW: Review and Recommendations for Internationalized Domain Names (IDNs)

  • To: Vittorio Bertola <vb@xxxxxxxxxxxxxx>
  • Subject: Re: [alac] FW: Review and Recommendations for Internationalized Domain Names (IDNs)
  • From: John L <johnl@xxxxxxxx>
  • Date: Mon, 16 Oct 2006 14:04:24 -0400 (EDT)

Ah, the bottomless swamp of despair that is IDNs. My wife just got back from a weekend course in introductory Tibetan, reminding me yet again how many odd and incompatible ways there are to write things down.

It is important to identify this entire range of problems because users, registrants, and policy makers often do not understand the protocol and other technical issues but only the difference between what they believe happens or should happen and what actually happens.

I take this to mean "IDNs are so complicated and counterintuitive that nobody understands them. And a lot of perfectly reasonable things you would expect to work just don't." People don't misunderstand them because the people are stupid, but because the problem is intractible and the issues are subtle. Indeed, I would say as a good first approximation, anyone who thinks there must be a straightforward way out of the IDN mess has demonstrated that he or she doesn't understand it. I sure don't.


As for the substance, I have a suggestion for what regards 3.1: I understand that this might be a pain in the ass (but still possible at this stage), but why don't they expand punycode to represent also the version number of Unicode that was used in the conversion? Or do I get the issue wrong?

Can't work. When you do a DNS lookup, your client makes a single query to the server which returns either an exact match or a failure. (We can ignore DNS wildcards which don't help here.) If you add a version number, there are now N forms of each name if there are N versions. So either a client has to make N lookups to find a name, or a server has to have N copies of its IDN names, or more likely both, with both servers and clients having to go through upgrade backflips whenever N -> N+1. The IDN community has gone through this a million times, if you know a way around it, there's a whole lot of people who would like to hear from you.


In general, if you bear with the constant blaming of the Unicode Consortium,

The problem is that the goals of the Unicode Consortium are to provide an encoding for every written form, but the needs of the DNS are to have a unique encoding for every written form, and the consortium's encodings are not unique.


The consortium has made some (I think unwise) decisions to add codes that provide alternate forms for things that already have an encoding mixed in with new codes for stuff it couldn't represent at all. The only way I can see out of this is to go back and divide each version of Unicode into a "base" with just the new codes and an "upgrade" with the alternate codes, and have punycode use only the base codes, but that doesn't seem likely to happen.

I think that we should just take a breath and jump into some
(perhaps controlled) reality... and deal with problems after they actually materialize.

The problems of multiple encodings are real now. It would have been really nice if the DNS allowed slightly fuzzy matching beyond ASCII case folding, but it doesn't, and it's 25 years too late to change it now.


Apart from one Chinese name, and one woman, all the rest seems to be the product of a big bunch of male Western engineers. For something that's expected to understand and serve the long neglected needs of the growing population of Asian, East-European and African Internet users, that does raise my eyebrow.

I think that's because they've been working on it longer. I'm not aware of anyone from places with complex written languages who says that it can all be a lot simpler. It's also not obvious to me that a native speaker of, say, Hindi or Xhosha has any greater insight into the coding problems of Japanese or Indonesian than a speaker of English.


R's,
John



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy