ICANN ICANN Email List Archives

[gnso-idn-wg]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label

  • To: Tina Dam <tina.dam@xxxxxxxxx>
  • Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  • From: subbiah <subbiah@xxxxxxxxx>
  • Date: Wed, 07 Mar 2007 11:04:32 -0800

Dear Tina, Tin Wee, Edmon, Sophia, Sharam, Ram

One point regarding sloppy programming and the internal xn-- in IDN labels being incorrectly picked up as U strings, leading to spurious speculative regsitration/activity.

Whether we ban xn-- (and to be super-safe CC--) in the middle of IDN labels is a decision to be made after the pros and cons are considered.

Personally, either decision is not disastrous on the scale of things. Let me emphasize this, before I begin.

So, in the interest of a complete examination starting from Sophia's suggestion and the fact that our working group's job is to think things through carefully - previous ones did not, which is why we were faced with ill-motivated registrations of IDN labels beginning with xn-- after the fact. After almost a decade of thinking about it it would be not wise to make mistakes again. Here are my pros and cons - others may have more.

The pros:

First, The number of applications that need to be made IDN-aware in coming years is vast. Its simply not just a few browsers and a few email apps and some server-side software for DNS infrastructure. Virtually every piece of software out there - millions of them probably - need to be probably made IDN-aware. For example even the free software Kodak ships with its digital cameras includes photo album software that understands web-links and highlights/hyper links them and another example is the business accounting software developed by Bulgarian programmers for local use in Mongolia (:-)) that is in many ways web-enabled and links to domains. The point: There are millions of software already and increasingly many of them being written locally all over the world at varying levels of ability and sophistication and the odds that at least one or one percent (which maybe 10 000 such) will screw up is reasonably high.

Secondly, some day after the migration is over we can remove this ban.

The cons:

It is theoretically possible that restricting just xn-- alone in the middle of IDN labels could prevent some character in some language from being registrable in an IDN label. Extending it to cc-- restriction could theoretically prevent as many as a 1000 characters from some languages from the universe of millions of Unicode characters from being registred. A small fraction, but not tothose affected.

However, a closer examination (I have not done an exhaustive examination yet) suggests that the "--" combination never occurs in an IDN portion of an IDN label (if someone disagrees please holler). The only time it happens I think is when an IDN label is composed of two parts - an ASCII part and a truly IDN part. In this case a direct ASCII registration of xn-- in the ASCII part (whether at the beginning or in the middle) leads to something we would like to ban, if thats is what we decide. The other scenario leading to a banning candidate is when an "xn-" is registered at the end of the ASCII part of an IDN label that includes both ASCII component and truly IDN component. I think what happens here, and I think Tin Wee was trying to point out, is that IDNA conversion ends up creating an "xn--" (ie. extra "-" appended) that would be in the middle of the final IDN label. (Of course this would apply for cc-- as well in super-safe mode). But when one realises that in strict ASCII domains (not IDN at all) historical rules prevent "-" being registered at the end of ASCII domains, this second scenario that leads to bannable candidates is philosophically identical to an existing rule and should not bring about any "heartaches" but rather only "consistency".

If these are as I preliminarily think the only situations that create the instances we might want to ban for sloppy programming reasons, than the theoretical possibility that some characters in some languages could become unregistrable goes away.

And this leaves the method or rule for enforcing. This would be simply be - after conversion into final IDN label, extend the current rule of banning CC-- at the beginning only to be throughout the entire IDN label. From a programming/implementaion perspective of additional difficulty - this is almost extremely minor.

Summary
---------

So assuming my initial exploration of consequences are right (lab testers/people or others can certainly test it) and based on my perhaps incomplete list of pros/cons, one would think the cons are virtually nil. While the pros, are not great ( i.e I can see my own self going along with Tina's view of "who cares about sloppy programmers, if you are sloppy you deserve it and the market will correct you" ) there are pros that can be gained for virtually no cons.

As to should we do this or not, that is not my call - I am swayable and as I said someone else should also think carefully about what I have said above. Technical things take far more time to think thru carefully then just a few "email" chains in Internet-time allow for.

Cheers

Subbiah

Tina Dam wrote:

Tin Wee, All,
While it naturally is impossible to test all applications the Technical Test
Phase II is focused on the application area. It is in progress of being
defined and planned and will contain elements around communication as well.
(communication to application providers...so far we have received quite some
interest in getting this right which is good).

Further, the revision of the protocol does not expect to be changing the
prefix and also one of the main reasons for the revision is to be able to
proceed with a non-unicode-version dependant protocol to avoid continuously
revisions, which could create further problems as you mention below.

I am not sure I follow your AXN-- discussion below...but I support Will and
Edmon comments on this. The protocol does not work mid-way strings. What
that means is that it is entirely possible to register a string that midway
has "xn--" in it, and I don't see any need for reserving such names. Sloppy
applications that take such strings and convert to U-strings should quickly
be revised by market complaints. As mentioned while some application testing
is in place, we need to keep in mind that (i) we cant test all applications
that exists now and in the future (ii) even if tested the providers can
change the implementation at any time.

Tina

PS> Sorry I was not on the call last night. I arrived late evening from a
long into LA and did not managed to stay up for the 3am call. I will listen
to the recording and if I have input I will provide it to the list.




-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx [mailto:owner-gnso-idn-wg@xxxxxxxxx] On Behalf Of Tan Tin Wee
Sent: Tuesday, March 06, 2007 4:06 PM
To: edmon@xxxxxxxxxxx
Cc: 'Shahram Soboutipour'; owner-gnso-idn-wg@xxxxxxxxx; 'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label


Ram Mohan wrote:
>> > Are you saying - something like <*CITIBANKchina.TLD*> where "china" is >> > in local script while CITIBANK is in Latin script should be banned, >> > because its Punycode translation would result in an <xn--> midway >> > through the string?


I agree with the comments made so far.
xn-- in the case mentioned by Ram won't happen in the current way Punycode works as William and Edmon pointed out.


Having said that, I agree that for the moment we may not want to add more complication by recommending to split the IDN label with xn-- embedded inside because xn-- can occur in punycode within a label like (using Ram's example and modifying it...) citibankxn-<China> will appear as xn--citibankxn--b28qq03g (e.g. http://mct.verisign-grs.com/conversiontool/convertServlet?inpu
t=xn--citibankxn--b28qq03g&type=PUNYCODE
converting from Punycode xn--citibankxn--b28qq03g to Unicode: citibankxn-?? or use
http://www.afilias.info/cgi-bin/convert_punycode.cgi)
...
which I think was the nub of Shahram's point:
> <CCHH>citibank-<CCHH><encodedCHINA>.tld


Of course, xn-- at the prefix will cause the rest of the label "citibankxn--b28qq03g" to be processed as such, but still
xn-- as mentioned by Sophia will pop up here and there by accident or by deliberate design by non-bonafide registrants.


I think what Sophia meant which Ram misunderstood was for some mechanism to trap xn-- inside labels to ensure that for instance, it doesn't confuse software programmers with sloppy programming that picks out xn-- inside an xn-- prefixed string (non-greedy algorithm) like in the case I mentioned, and display the wrong IDN label; or that with the mixed scripts thing, if we don't look carefully in the xn-- or CCHH issue, if the next Unicode version pops up that is of enough drastic change, and we need to migrate, and in the process change xn-- to some other CCHH for example by way of illustration, we may lose the option if xm-- or xe-- etc was already registered as axe-?? with xn--axe--3f5fw08b at the back end encoding or AXN-?? with xn--axn--3f5fw08b which is a conceivable registration by the AXN satellite channel.
OR in cases of spoofing or passing off by confusing people with citibankxn-?? and citibank.xn-?? which look pretty close, that may get punycoded to
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu
t=citibankxn-%E4%B8%AD%E5%9B%BD&type=UTF8
xn--citibankxn--b28qq03g
and
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu
t=citibank.xn-%E4%B8%AD%E5%9B%BD&type=UTF8
citibank.xn--xn--x68dy61b
respectively.
Try these two labels with Affilias converter and the second one will generate block, while http://www.nameisp.com/punycode.asp will work just like the verisign converter... So these are programming variations we may need to follow though.


If we recommend against AXN-?? because it generates a potentially confusing xn-- string inside a punycode label, then AXN?? could be an option, as it will generate xn--axn-x68dy61b, which is xn- and not xn--.

Finally,
Edmon Chung wrote:
> Nevertheless, with regards to our discussion at hand, I am quite certain > we have comprehensive protection with the CCHH reserved as a prefix.


Yes, I suspect this might be the case, but somebody might want to get a team of programmers to run a check on some test cases. Does anyone know if this kind of scenario is being tested at the moment in the ICANN testing contract?

bestrgds

tin wee


Edmon Chung wrote:


Hi Shahram,



There was an extensive discussion in the original IDN protocol development about the use of the prefix (or suffix or other

possible

identifiers), and finally CCHH was chosen. I highly doubt that we would be choosing a scheme that would split up a label (for

many good

reasons including bidi and single script considerations) into different chunks with different prefixes, but no one can

predict the

future I suppose :-)



Nevertheless, with regards to our discussion at hand, I am quite certain we have comprehensive protection with the CCHH

reserved as a prefix.




Edmon











*From:* owner-gnso-idn-wg@xxxxxxxxx [mailto:owner-gnso-idn-wg@xxxxxxxxx]
*On Behalf Of *Shahram Soboutipour
*Sent:* Tuesday, March 06, 2007 4:50 PM
*To:* owner-gnso-idn-wg@xxxxxxxxx
*Cc:* 'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
*Subject:* RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label




Dear Edmon



Regarding the sample CITIBANKchina.TLD (where china is in Chinese charset), I think there is a 3^rd possibility which might

be Sophia's idea:


<CCHH>citibank-<CCHH><encodedCHINA>.tld

It means that every separate part of a label in non-ascii

strings be

translated with a CCHH at first. I am not sure if there is

a rule for

this right now or not, but I myself do not agree with this type. I prefer <CCHH>citibank-<encodedCHINA>.tld cause:

1. I think there is enough space for possible further changes and developments in IDNA standard in CC part of CCHH, so there

must be no

worries.

2. the CCHH (at first) is a good rule to define an IDN ,

and I think

it can be a rule in all the levels of a url (not only 2^nd

and 3^rd )

but seems higher levels other than 3^rd are out of scope of ICANN's policy, BUT must be mentioned in their own technical

decision makings.






Regards,



/*Shahram Soboutipour*/ <BLOCKED::mailto:soboutipour@xxxxxxxxxxx>

*President and CEO*

*Karmania Media* <BLOCKED::http://www.karmania.ir/>

Tel: +98 341 2117844,5

Mobile: +98 913 1416626

Fax: +98 341 2117851

-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx


[mailto:owner-gnso-idn-wg@xxxxxxxxx]


On Behalf Of Edmon Chung
Sent: Tuesday, March 06, 2007 6:09 AM
To: 'Tan, William'; rmohan@xxxxxxxxxxxx
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label



I dont think you are missing anything William.

Was trying to speak up during the call earlier, I dont think the concern Sophia was articulating should be an issue.



If I am not mistaken, Sophia was asking whether it would be

necessary

to reserve names such as:



abc<CCHH>xyz.tld



These names would NOT be considered IDN nor parts of which IDN, but are simply ASCII domains. <CCHH> can be best seen as a prefix to denote that a domain label (i.e. between two dots) has at least one non LDH
(letter-digit-hyphen) character.




Using the example described:



citibank<CHINA>.tld



where <CHINA> is in Chinese, William's explanation is correct, it should
become:




<CCHH>citibank-<encodedCHINA>.tld



And NOT



Citibank<CCHH><encodedCHINA>.tld



So, by reserving <CCHH> at the front (i.e. first 4

characters, or more

precisely, hyphens in the third and fourth character> we cover all cases of intended IDN expressions.



Edmon











-----Original Message-----


From: owner-gnso-idn-wg@xxxxxxxxx [mailto:owner-gnso-idn-wg@xxxxxxxxx] On


Behalf Of Tan, William


Sent: Tuesday, March 06, 2007 7:47 AM


To: rmohan@xxxxxxxxxxxx


Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx


Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label


Hi all,


I believe the motivations behind banning strings with

hyphens in the


*third *and *fourth *positions are:


1. to protect registries who do not offer IDN registrations from


unknowingly registering IDNs; and


2. to reserve future revisions to the IDNA standard where a different


prefix might be assigned.


Ram Mohan wrote:


>


> Are you saying - something like <*CITIBANKchina.TLD*> where "china" is


> in local script while CITIBANK is in Latin script should be banned,


> because its Punycode translation would result in an

<xn--> midway


> through the string?


>


I'm not sure I follow this. CITIBANKchina.TLD would translate to


xn--citibank-encodedchunk.TLD, so xn-- would not occur

midway in the

ACE


string.


> In general, the rationale for banning "CCHH" at a

position other

than


> the beginning of a string/label is unclear.


I have not seen any documents that suggest banning CCHH

at anything

but


the beginning of a string. Am I missing something?


Sophia said:


> All registrations should


> be in the IDN label, and that the ACE label should be

internal to

the


> operations of the registration. *One should not be offering to


> register xn--.... as a label or any ACE label since it is an internal


> encoding, so as to prevent confusion and other

malfeasance (phishing)*.


Many registries today use the ACE string at the registration protocol


level, so your statement would essentially be advising

against that


practice. Personally, I don't think it is a problem unless the registry


does NOT offer IDN and is accepting xn-- labels (in which case it


probably simply treats the registration as ASCII and does

not check

for


IDNA validity.) We may be in agreement here, but I wanted

to further


qualify your statement.


In table 4.4 of "Recommendation Tables for RN-WG Reports.doc":


> For each IDN gTLD proposed, applicant must provide both

the "ASCII


> compatible (ACE) form of an IDNA valid string"

("A-label") and in


> local script form (Unicode) of the top level domain ("U-label").


I would also add that the applicant should provide additional strings


that, after applying IDNA ToASCII operation, result in

the A-label.


Additionally, there may also be complications where the U-label could be


entered into an application using an input method editor ("keyboard")


that may produce a sequence of Unicode characters that may not convert


to the A-label (either becomes a different A-label or fails conversion.)


This may be due to user perception that a character is what one thinks


it is, but when entered using the local input software produces a


different character due to locale differences. I will try

to dig up

some


examples. This is not a technical / policy issue, but is

a usability


issue that affects the stability of IDNs.


Best,


=wil















-- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.413 / Virus Database: 268.18.7/713 - Release Date: 3/7/2007




<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy