Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
Let's take the analysis with examples.
Tan, William wrote:
Sophia B wrote:I saw "sloppy programmers" mentioned several times and thought I'd weigh in.The decision is really upto the group. For myself, I think its okay - some sloppy programmers will have to re-program in the future, that's all - some, like Tina, felt they would deserve it....
1. Converting input string into ACE for resolution
Sloppy programming may pick up just äå and convert that to xn--fiqs8s, which is the famous .zhongguo or .<CHINA> which was deployed for the past three years within China and resolves cleanly for a whole bunch of domains with .<CHINA> in IDN TLD. Notice that xn--b28qq03g is not the same as xn--fiqs8s. The ascii characters in the mixed ASCII-IDN label are pulled out first, and the punycode algorithm triggered contextually.
So if you punycode the left domain label, you get the stuff on the right. IDN label Punycode abcdefghxn-äå xn--abcdefghxn--b28qq03g abcd1234xn-äå xn--abcd1234xn--b28qq03g 12345678xn-äå xn--12345678xn--b28qq03g 12345678xnxäå xn--12345678xnx-b28qq03g äå xn--fiqs8s
So anything picking up .<CHINA> will generate fiqs8s which is pretty different from the rest of those which had .ascii-<CHINA> And so long as the ascii portions were the same length, the same punycode for <CHINA> ie. "b28qq03g" will appear despite whatever came in front.
However, if the lengths changed, you can see, the punycode part for <CHINA> changes like so.
2345678xn-äå xn--2345678xn--4t2p695f 345678xn-äå xn--345678xn--yl6nq98e 45678xn-äå xn--45678xn--sd0mx14e 5678xn-äå xn--5678xn--m43k486d 678xn-äå xn--678xn--gw7is51d 78xn-äå xn--78xn--9n1hm04c 8xn-äå xn--8xn--3f5fw08b xn-äå xn--xn--x68dy61b n-äå xn--n--ry2c206a -äå not allowed äå xn--fiqs8s
(with the exception of the hyphen -<CHINA> where it is disallowed).
2. Converting a domain name into Unicode for display
IDN label Punycode citibankxn-äå xn--citibankxn--b28qq03g
ie. instead of picking out xn-- at the beginning of the label, the sloppy programming picked out the second one ie. xn--b28qq03g and tried to ToUnicode it.... it gets an error in reply "Error:Prohibited 3a4a7" in the case of the Verisign Punycode converter.
Therefore, it is my personal opinion that banning double hyphens is unnecessary.
So banning the double hyphens inside an IDN label is not necessary because it will generate a triple hyphen in the Punycode
And banning a single hyphen inside an IDN label, is not necessary even if the hyphen is preceded by an "xn" prefix, unless we can prove that the xn--punycode shorted label, does not convert to a meaningful or unmeaningful Unicode string.
Without doing an exhaustive survey, and without taking apart the Punycode algorithm and analysing it with a fine tooth comb, I cannot immediately say if I can agree with William or not.
Now I have followed the Punycode development half a decade before, but without pondering over it carefully once again.... wow, my head hurts right now, just thinking about this work to do.... Can someone help me here!
Now I vaguely recall the program assuring us something about this kind of security feature but I honestly haven't tested it out in this manner as Sophia, Ram or Subbiah or you folks have mentioned.
Will a Unicode string prefixed by an ascii string ending with xn- ever generate a shorter xn--PUNYCODE string, which when ToUnicode'd, produces another legitimate Unicode string or will it simply trigger an error which can be error-trapped? So far in the simple example with citibank-<CHINA> it looks like error-trapped in Verisign's Punycode converter. Edmon, Will, do you remember any of the millions of exchanges about IDNA way back five or so years ago contained this kind of scenario? I am sure we must have discussed something like this before..