ICANN ICANN Email List Archives

[gnso-idn-wg]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label

  • To: Tan Tin Wee <tinwee@xxxxxxxxxxxxxx>
  • Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  • From: "Tan, William" <William.Tan@xxxxxxxxxxx>
  • Date: Fri, 09 Mar 2007 19:03:43 -0500

I read your email twice. On the second instance I managed to grok it. You have certainly presented some very interesting examples, along the lines of how one can exploit badly written code to cause confusion / spoofing. This is definitely beneficial as it gets shows the intricacies of the algorithm that IDN is so much dependent upon.

Before we go on hypothesizing what a bad program can do, is there any empirical evidence of sloppy IDN programming practice in the past or present?


Tan Tin Wee wrote:
So if we took something like an intended label citibankäå just to use
Ram's example and stick xn- after that, like citibankxn-äå
A programming properly written will pick up the whole label,
put a prefix xn-- and pull out the ascii characters, citibankxn-
(including the dash), e.g. "xn--citibankxn-"
and then stick in a hyphen to separate the ascii part from the
unicode encoding part, and then append the the unicode encoding of
äå which in this context, is "b28qq03g" and form the ACE label:

xn--citibankxn--b28qq03g

Sloppy programming may pick up just äå and convert that to
xn--fiqs8s, which is the famous .zhongguo or .<CHINA>
This is of course a bad case of misreading the IDNA RFC. There are dozens of other programming errors that could occur and we couldn't possibly prevent them all with policy.

which
was deployed for the past three years within China and resolves
cleanly for a whole bunch of domains with .<CHINA> in IDN TLD.
Perhaps it would be more politically correct to say that .CHINA IDN TLD is currently in test bed phase, and is resolvable by using client plug-ins that tacks on the ".cn" suffix.

So if an application detects punycode by searching for xn-- anywhere
in a string wrongly picks out the second xn-- instead of the first
xn--, like in the following example
This would be another case of something else that could go awry in the interpretation of the specs.

Reserving the CCHH prefix is really done to protect us from a) protocol changes requiring a prefix change; and b) registries who do not offer IDNs - so that IDNs cannot be inadvertently registered in an otherwise ASCII only space.

I doubt the RNWG had the intention of reserving names for the sake of protecting users from sloppy programming. If that is the case, TLDs with more than 3 characters would never have been created as so many legacy software were hardcoding the list of TLDs or applying length restrictions to the TLD.

Now I have followed the Punycode development half a decade before,
but without pondering over it carefully once again....
wow, my head hurts right now, just thinking about this work
to do.... Can someone help me here!
If a professor's head hurts, imagine the rest of us! ;-)

Will a Unicode string prefixed by an ascii string ending with xn-
ever generate a shorter xn--PUNYCODE string, which when ToUnicode'd,
produces another legitimate Unicode string or will it simply
trigger an error which can be error-trapped? So far in the
simple example with citibank-<CHINA> it looks like error-trapped
in Verisign's Punycode converter. Edmon, Will, do you remember
any of the millions of exchanges about IDNA way back five or
so years ago contained this kind of scenario?
I don't recall such discussions; if there were, I may have missed them. You demonstrated that in some scenarios, if an application incorrectly picks the second xn-- sequence and call ToUnicode on it the result is some sequence of Unicode characters that may not make sense. However, I'm not sure what difference it makes whether conversion fails or succeeded with meaningful or unmeaningful Unicode.

Best,

=wil



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy