ICANN Email Archives: [gnso-idn-wg]

ICANN ICANN Email List Archives

[gnso-idn-wg]

<<< Chronological Index >>> <<< Thread Index >>>

Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label

To: "Tan, William" <William.Tan@xxxxxxxxxxx>
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
From: Tan Tin Wee <tinwee@xxxxxxxxxxxxxx>
Date: Sat, 10 Mar 2007 02:37:42 +0800

Let's take the analysis with examples.

Tan, William wrote:

Sophia B wrote:
The decision is really upto the group. For myself, I think its okay -
some sloppy programmers will have to re-program in the future, that's
all - some, like Tina, felt they would deserve it....
I saw "sloppy programmers" mentioned several times and thought I'd weigh in. As someone who has been involved in the development of IDN support in Mozilla and the ISC EchIDNA plug-in, I feel that it is highly unlikely that any programmer will have trouble with this. The reason for that is most applications will use one of the available IDNA libraries to handle IDNs. There are possibly two main pieces to IDN programming:


Yes, I agree if you use the library properly, there shouldn't be a problem.
But if somebody doesn't, let's take a look at the potential consequences

1. Converting input string into ACE for resolution
An application would search for non-ASCII characters anywhere in the string to detect IDN, and then pass the domain name into the IDNA ToASCII operation. The result will be punycode. I don't see a problem there. Should the programmer decide to implement the IDNA operations natively instead of using a library and gets it wrong, banning CCHH would not help anyway.


So if we took something like an intended label citibankäå just to use
Ram's example and stick xn- after that, like citibankxn-äå
A programming properly written will pick up the whole label,
put a prefix xn-- and pull out the ascii characters, citibankxn-
(including the dash), e.g. "xn--citibankxn-"
and then stick in a hyphen to separate the ascii part from the
unicode encoding part, and then append the the unicode encoding of
äå which in this context, is "b28qq03g" and form the ACE label:

xn--citibankxn--b28qq03g

Sloppy programming may pick up just  äå and convert that to
xn--fiqs8s, which is the famous .zhongguo or .<CHINA> which
was deployed for the past three years within China and resolves
cleanly for a whole bunch of domains with .<CHINA> in IDN TLD.
Notice that xn--b28qq03g is not the same as xn--fiqs8s.
The ascii characters in the mixed ASCII-IDN label are pulled
out first, and the punycode algorithm triggered contextually.

So if you punycode the left domain label, you get the stuff
on the right.
IDN label       Punycode
abcdefghxn-äå  xn--abcdefghxn--b28qq03g
abcd1234xn-äå  xn--abcd1234xn--b28qq03g
12345678xn-äå  xn--12345678xn--b28qq03g
12345678xnxäå  xn--12345678xnx-b28qq03g
äå                         xn--fiqs8s

So anything picking up .<CHINA> will generate
fiqs8s which is pretty different from the
rest of those which had .ascii-<CHINA>
And so long as the ascii portions were the
same length, the same punycode for <CHINA>
ie. "b28qq03g" will appear despite whatever
came in front.

However, if the lengths changed, you can
see, the punycode part for <CHINA> changes
like so.

2345678xn-äå    xn--2345678xn--4t2p695f
345678xn-äå      xn--345678xn--yl6nq98e
45678xn-äå        xn--45678xn--sd0mx14e
5678xn-äå          xn--5678xn--m43k486d
678xn-äå            xn--678xn--gw7is51d
78xn-äå              xn--78xn--9n1hm04c
8xn-äå                xn--8xn--3f5fw08b
xn-äå                  xn--xn--x68dy61b
n-äå                    xn--n--ry2c206a
-äå                      not allowed
äå                        xn--fiqs8s

(with the exception of the hyphen -<CHINA>
where it is disallowed).

2. Converting a domain name into Unicode for display
An application typically detects punycode by searching for a configurable prefix (set to "xn--" by default) anywhere in the string. Once punycode is detected, the string is passed into the IDNA ToUnicode operation. While it is quite possible that a false positive may arise if the programmer is not careful to ensure that the prefix only matches at the beginning of a label, the ToUnicode operation will return the same ASCII label if it is indeed simply a false positive. In any case, this should not cause any damage.


So if an application detects punycode by searching for xn-- anywhere
in a string wrongly picks out the second xn-- instead of the first
xn--, like in the following example

IDN label          Punycode
citibankxn-äå    xn--citibankxn--b28qq03g

ie. instead of picking out xn-- at the beginning
of the label, the sloppy programming picked out the second one
ie. xn--b28qq03g and tried to ToUnicode it....
it gets an error in reply "Error:Prohibited 3a4a7" in the
case of the Verisign Punycode converter.

Therefore, it is my personal opinion that banning double hyphens is unnecessary.


So banning double hyphens is totally unnecessary if the conversion
algorithm does not
2345678xn-äå    xn--2345678xn--4t2p695f  xn--4t2p695f Error:Prohibited 356f9
345678xn-äå      xn--345678xn--yl6nq98e  xn--yl6nq98e Error:Prohibited 33e4e
45678xn-äå        xn--45678xn--sd0mx14e  xn--sd0mx14e Error:Prohibited 2bb9d
5678xn-äå          xn--5678xn--m43k486d  xn--m43k486d ?ð
678xn-äå            xn--678xn--gw7is51d  xn--gw7is51d ??
78xn-äå              xn--78xn--9n1hm04c  xn--9n1hm04c Error:Prohibited 1d293
8xn-äå                xn--8xn--3f5fw08b  xn--3f5fw08b Error:Prohibited 184e5
xn-äå                  xn--xn--x68dy61b  xn--x68dy61b Error:Prohibited 13737
n-äå                    xn--n--ry2c206a  no second xn-- detected. n-äå is
regenerated.

So banning the double hyphens inside an IDN label is not necessary
because it will generate a triple hyphen in the Punycode

And banning a single hyphen inside an IDN label, is not necessary
even if the hyphen is preceded by an "xn" prefix, unless we can
prove that the xn--punycode shorted label, does not convert to
a meaningful or unmeaningful Unicode string.

Without doing an exhaustive survey, and without taking apart
the Punycode algorithm and analysing it with a fine tooth comb,
I cannot immediately say if I can agree with William or not.

Now I have followed the Punycode development half a decade before,
but without pondering over it carefully once again....
wow, my head hurts right now, just thinking about this work
to do.... Can someone help me here!

Now I vaguely recall the program assuring us something about
this kind of security feature but I honestly haven't tested it out
in this manner as Sophia, Ram or Subbiah or you folks have mentioned.

Will a Unicode string prefixed by an ascii string ending with xn-
ever generate a shorter xn--PUNYCODE string, which when ToUnicode'd,
produces another legitimate Unicode string or will it simply
trigger an error which can be error-trapped? So far in the
simple example with citibank-<CHINA> it looks like error-trapped
in Verisign's Punycode converter. Edmon, Will, do you remember
any of the millions of exchanges about IDNA way back five or
so years ago contained this kind of scenario?
I am sure we must have discussed something like this before..

Best regards,

=wil


bestrgds
tin wee

Follow-Ups:
- Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Tan, William
- Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Avri Doria

References:
- RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Edmon Chung
- Re: [gnso-rn-wg] RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Sophia B
- [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Cary Karp
- Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Sophia B
- Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  - From: Tan, William

<<< Chronological Index >>> <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy