<<<
Chronological Index
>>>    <<<
Thread Index
>>>
 
Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
- To: Tina Dam <tina.dam@xxxxxxxxx>
 
- Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
 
- From: subbiah <subbiah@xxxxxxxxx>
 
- Date: Wed, 07 Mar 2007 11:04:32 -0800
 
 
 
Dear Tina, Tin Wee, Edmon, Sophia, Sharam, Ram  
One point regarding sloppy programming and the internal xn-- in IDN 
labels being incorrectly picked up as U strings, leading to spurious 
speculative regsitration/activity. 
 
Whether we ban xn--  (and to be super-safe CC--) in the middle of IDN 
labels is a decision to be made after the pros and cons are considered. 
 
Personally, either decision is not disastrous on the scale of things. 
Let me emphasize this, before I begin. 
 
So, in the interest of a complete examination starting from Sophia's 
suggestion and the fact that our working group's job is to think things 
through carefully - previous ones did not, which is why we were faced 
with ill-motivated registrations of IDN labels beginning with xn-- after 
the fact. After almost a decade of thinking about it it would be not 
wise to make mistakes again. Here are my pros and cons - others may have 
more. 
 
The pros:  
First, The number of applications that need to be made IDN-aware in 
coming years is vast. Its simply not just  a few browsers and a few  
email apps  and some server-side software  for DNS infrastructure. 
Virtually every piece of software out there - millions of them probably 
- need to be probably made IDN-aware. For example even the free software 
Kodak ships with its digital cameras includes photo album software that 
understands web-links and highlights/hyper links them and another 
example is the business accounting software developed by Bulgarian 
programmers for local use in Mongolia  (:-)) that is in many ways 
web-enabled and links to domains. The point: There are millions of 
software already and increasingly many of them being written locally all 
over the world at varying levels of ability and sophistication and the 
odds that at least one or one percent (which maybe  10 000 such) will 
screw up is reasonably high. 
 
Secondly, some day after the migration is over we can remove this ban.  
The cons:  
It is theoretically possible that restricting  just xn-- alone in the 
middle of IDN labels could prevent some character in some language from 
being registrable in an IDN label. Extending it to cc-- restriction 
could theoretically prevent as many as a 1000 characters from some 
languages from the universe of millions of Unicode characters from being 
registred. A small fraction, but not tothose affected. 
 
However, a closer examination (I have not done an exhaustive examination 
yet) suggests that the "--" combination never occurs in an IDN portion 
of an IDN label (if someone disagrees please holler). The only time it 
happens I think is when an IDN label is composed of two parts - an ASCII 
part and a truly IDN part. In this case a direct ASCII registration of 
xn-- in the ASCII part (whether at the beginning or in the middle) leads 
to something we would like to ban, if thats is what we decide. The other 
scenario leading to a banning candidate is when an "xn-" is registered 
at the end of the ASCII part of an IDN label that includes both ASCII 
component and truly IDN component. I think what happens here, and I 
think Tin Wee was trying to point out,  is that IDNA conversion ends up 
creating an "xn--" (ie. extra "-" appended) that would be in the middle 
of the final IDN label. (Of course  this would apply for cc-- as well  
in super-safe mode). But when  one realises  that in strict ASCII 
domains (not IDN at all)  historical rules prevent "-"  being registered 
at the  end of ASCII domains, this second scenario that leads to 
bannable candidates is philosophically identical to an existing rule and 
should not bring about any "heartaches" but rather only "consistency". 
 
If these are as I preliminarily think the only situations that create 
the instances we might want to ban for sloppy programming reasons, than 
the theoretical possibility that some characters in some languages could 
become unregistrable goes away. 
 
And this leaves the method or rule for enforcing. This would be simply 
be - after conversion into final IDN label, extend the current rule of 
banning CC-- at the beginning only to be throughout the entire IDN 
label. From a programming/implementaion perspective of additional 
difficulty - this is almost extremely minor. 
 
Summary
---------  
So assuming my initial exploration of consequences are right (lab 
testers/people or others can certainly test it) and based on my perhaps 
incomplete list of pros/cons, one would think the cons are virtually 
nil. While the pros, are not great ( i.e I can see my own self going 
along with Tina's view of "who cares about sloppy programmers, if you 
are sloppy you deserve it and the market will correct you" ) there are 
pros that can be gained for virtually no cons. 
 
As to should we do this or not, that is not my call - I am swayable and 
as I said someone else should also think carefully about what I have 
said above. Technical things take far more time to think thru carefully 
then just a few "email" chains in Internet-time allow for. 
 
Cheers  
Subbiah  
Tina Dam wrote:  
Tin Wee, All,
While it naturally is impossible to test all applications the Technical Test
Phase II is focused on the application area. It is in progress of being
defined and planned and will contain elements around communication as well.
(communication to application providers...so far we have received quite some
interest in getting this right which is good).  
Further, the revision of the protocol does not expect to be changing the
prefix and also one of the main reasons for the revision is to be able to
proceed with a non-unicode-version dependant protocol to avoid continuously
revisions, which could create further problems as you mention below.  
I am not sure I follow your AXN-- discussion below...but I support Will and
Edmon comments on this. The protocol does not work mid-way strings. What
that means is that it is entirely possible to register a string that midway
has "xn--" in it, and I don't see any need for reserving such names. Sloppy
applications that take such strings and convert to U-strings should quickly
be revised by market complaints. As mentioned while some application testing
is in place, we need to keep in mind that (i) we cant test all applications
that exists now and in the future (ii) even if tested the providers can
change the implementation at any time.  
Tina  
PS> Sorry I was not on the call last night. I arrived late evening from a
long into LA and did not managed to stay up for the 3am call. I will listen
to the recording and if I have input I will provide it to the list.  
   
 
-----Original Message----- 
From: owner-gnso-idn-wg@xxxxxxxxx 
[mailto:owner-gnso-idn-wg@xxxxxxxxx] On Behalf Of Tan Tin Wee 
Sent: Tuesday, March 06, 2007 4:06 PM 
To: edmon@xxxxxxxxxxx 
Cc: 'Shahram Soboutipour'; owner-gnso-idn-wg@xxxxxxxxx; 
'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx 
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label 
 
Ram Mohan wrote: 
>>  > Are you saying - something like <*CITIBANKchina.TLD*> 
where "china" is  >>  > in local script while CITIBANK is in 
Latin script should be banned,  >>  > because its Punycode 
translation would result in an <xn--> midway  >>  > through 
the string? 
 
I agree with the comments made so far. 
xn-- in the case mentioned by Ram won't happen in the current 
way Punycode works as William and Edmon pointed out. 
 
Having said that, I agree that for the moment we may not want 
to add more complication by recommending to split the IDN 
label with xn-- embedded inside because xn-- can occur in 
punycode within a label like (using Ram's example and 
modifying it...) citibankxn-<China> will appear as 
xn--citibankxn--b28qq03g (e.g. 
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu 
t=xn--citibankxn--b28qq03g&type=PUNYCODE 
converting from Punycode xn--citibankxn--b28qq03g to Unicode: 
citibankxn-?? or use 
http://www.afilias.info/cgi-bin/convert_punycode.cgi) 
... 
which I think was the nub of Shahram's point: 
> <CCHH>citibank-<CCHH><encodedCHINA>.tld 
 
Of course, xn-- at the prefix will cause the rest of the 
label "citibankxn--b28qq03g" to be processed as such, but still 
xn-- as mentioned by Sophia will pop up here and there by 
accident or by deliberate design by non-bonafide registrants. 
 
I think what Sophia meant which Ram misunderstood was for 
some mechanism to trap xn-- inside labels to ensure that for 
instance, it doesn't confuse software programmers with sloppy 
programming that picks out xn-- inside an xn-- prefixed 
string (non-greedy algorithm) like in the case I mentioned, 
and display the wrong IDN label; or that with the mixed 
scripts thing, if we don't look carefully in the xn-- or CCHH 
issue, if the next Unicode version pops up that is of enough 
drastic change, and we need to migrate, and in the process 
change xn-- to some other CCHH for example by way of 
illustration, we may lose the option if xm-- or xe-- etc was 
already registered as axe-?? with xn--axe--3f5fw08b at the 
back end encoding or AXN-??  with xn--axn--3f5fw08b which is 
a conceivable registration by the AXN satellite channel. 
OR in cases of spoofing or passing off by confusing people 
with citibankxn-?? and citibank.xn-?? which look pretty 
close, that may get punycoded to 
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu 
t=citibankxn-%E4%B8%AD%E5%9B%BD&type=UTF8 
xn--citibankxn--b28qq03g 
and 
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu 
t=citibank.xn-%E4%B8%AD%E5%9B%BD&type=UTF8 
citibank.xn--xn--x68dy61b 
respectively. 
Try these two labels with Affilias converter and the second 
one will generate block, while 
http://www.nameisp.com/punycode.asp will work just like the 
verisign converter... So these are programming variations we 
may need to follow though. 
 
If we recommend against AXN-?? because it generates a 
potentially confusing xn-- string inside a punycode label, 
then AXN?? could be an option, as it will generate 
xn--axn-x68dy61b, which is xn- and not xn--. 
 
Finally, 
Edmon Chung wrote: 
> Nevertheless, with regards to our discussion at hand, I am 
quite certain  > we have comprehensive protection with the 
CCHH reserved as a prefix. 
 
Yes, I suspect this might be the case, but somebody might 
want to get a team of programmers to run a check on some test 
cases. Does anyone know if this kind of scenario is being 
tested at the moment in the ICANN testing contract? 
 
bestrgds  
tin wee  
 Edmon Chung wrote: 
    
 
Hi Shahram,  
 
  
There was an extensive discussion in the original IDN protocol 
development about the use of the prefix (or suffix or other 
      
 
 possible 
    
 
identifiers), and finally CCHH was chosen.  I highly doubt that we 
would be choosing a scheme that would split up a label (for 
      
 
 many good 
    
 
reasons including bidi and single script considerations) into 
different chunks with different prefixes, but no one can 
      
 
 predict the 
    
 
future I suppose :-)  
 
  
Nevertheless, with regards to our discussion at hand, I am quite 
certain we have comprehensive protection with the CCHH 
      
 
 reserved as a prefix. 
    
 
 
  
Edmon  
 
  
 
  
 
  
 
  
 
  
*From:* owner-gnso-idn-wg@xxxxxxxxx 
[mailto:owner-gnso-idn-wg@xxxxxxxxx] 
*On Behalf Of *Shahram Soboutipour 
*Sent:* Tuesday, March 06, 2007 4:50 PM 
*To:* owner-gnso-idn-wg@xxxxxxxxx 
*Cc:* 'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx 
*Subject:* RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label 
 
 
  
Dear Edmon  
 
  
Regarding the sample CITIBANKchina.TLD (where china is in Chinese 
charset), I think there is a 3^rd possibility which might 
      
 
 be Sophia's idea: 
    
 
<CCHH>citibank-<CCHH><encodedCHINA>.tld  
It means that every separate part of a label in non-ascii 
      
 
 strings be 
    
 
translated with a CCHH at first. I am not sure if there is 
      
 
 a rule for 
    
 
this right now or not, but I myself do not agree with this type. I 
prefer <CCHH>citibank-<encodedCHINA>.tld cause: 
 
1. I think there is enough space for possible further changes and 
developments in IDNA standard in CC part of CCHH, so there 
      
 
 must be no 
    
 
worries.  
2. the CCHH (at first) is a good rule to define an IDN , 
      
 
 and I think 
    
 
it can be a rule in all the levels of a url (not only 2^nd 
      
 
 and 3^rd ) 
    
 
but seems higher levels other than 3^rd are out of scope of ICANN's 
policy, BUT must be mentioned in their own technical 
      
 
 decision makings. 
    
 
 
  
 
  
Regards,  
 
  
/*Shahram Soboutipour*/ <BLOCKED::mailto:soboutipour@xxxxxxxxxxx>  
*President and CEO*  
*Karmania Media* <BLOCKED::http://www.karmania.ir/>  
Tel: +98 341 2117844,5  
Mobile: +98 913 1416626  
Fax: +98 341 2117851  
-----Original Message----- 
From: owner-gnso-idn-wg@xxxxxxxxx 
      
 
 [mailto:owner-gnso-idn-wg@xxxxxxxxx] 
    
 
On Behalf Of Edmon Chung
Sent: Tuesday, March 06, 2007 6:09 AM
To: 'Tan, William'; rmohan@xxxxxxxxxxxx
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label  
 
  
I dont think you are missing anything William.  
Was trying to speak up during the call earlier, I dont think the 
concern Sophia was articulating should be an issue. 
 
 
  
If I am not mistaken, Sophia was asking whether it would be 
      
 
 necessary 
    
 
to reserve names such as:  
 
  
abc<CCHH>xyz.tld  
 
  
These names would NOT be considered IDN nor parts of which IDN, but 
are simply ASCII domains.  <CCHH> can be best seen as a prefix to 
denote that a domain label (i.e. between two dots) has at least one 
non LDH 
(letter-digit-hyphen) character. 
 
 
  
Using the example described:  
 
  
citibank<CHINA>.tld  
 
  
where <CHINA> is in Chinese, William's explanation is correct, it 
should 
become: 
 
 
  
<CCHH>citibank-<encodedCHINA>.tld  
 
  
And NOT  
 
  
Citibank<CCHH><encodedCHINA>.tld  
 
  
So, by reserving <CCHH> at the front (i.e. first 4 
      
 
 characters, or more 
    
 
precisely, hyphens in the third and fourth character> we cover all 
cases of intended IDN expressions. 
 
 
  
Edmon  
 
  
 
  
 
  
 
  
      
 
-----Original Message----- 
        
 
From: owner-gnso-idn-wg@xxxxxxxxx 
[mailto:owner-gnso-idn-wg@xxxxxxxxx] On 
        
 
Behalf Of Tan, William 
        
 
Sent: Tuesday, March 06, 2007 7:47 AM 
        
 
To: rmohan@xxxxxxxxxxxx 
        
 
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx 
        
 
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label 
        
 
Hi all, 
        
 
I believe the motivations behind banning strings with 
        
 
  hyphens in the 
    
 
*third *and *fourth *positions are: 
        
 
1. to protect registries who do not offer IDN registrations from 
        
 
unknowingly registering IDNs; and 
        
 
2. to reserve future revisions to the IDNA standard where a 
different 
        
 
prefix might be assigned. 
        
 
Ram Mohan wrote: 
        
 
> 
        
 
> Are you saying - something like <*CITIBANKchina.TLD*> where 
"china" is 
        
 
> in local script while CITIBANK is in Latin script should be 
banned, 
        
 
> because its Punycode translation would result in an 
        
 
  <xn--> midway 
    
 
> through the string? 
        
 
> 
        
 
I'm not sure I follow this. CITIBANKchina.TLD would translate to 
        
 
xn--citibank-encodedchunk.TLD, so xn-- would not occur 
        
 
  midway in the 
    
 
ACE 
        
 
string. 
        
 
> In general, the rationale for banning "CCHH" at a 
        
 
  position other 
    
 
than 
        
 
> the beginning of a string/label is unclear. 
        
 
I have not seen any documents that suggest banning CCHH 
        
 
  at anything 
    
 
but 
        
 
the beginning of a string. Am I missing something? 
        
 
Sophia said: 
        
 
> All registrations should 
        
 
> be in the IDN label, and that the ACE label should be 
        
 
  internal to 
    
 
the 
        
 
> operations of the registration. *One should not be offering to 
        
 
> register xn--.... as a label or any ACE label since it is an 
internal 
        
 
> encoding, so as to prevent confusion and other 
        
 
  malfeasance (phishing)*. 
    
 
Many registries today use the ACE string at the registration 
protocol 
        
 
level, so your statement would essentially be advising 
        
 
  against that 
    
 
practice. Personally, I don't think it is a problem unless the 
registry 
        
 
does NOT offer IDN and is accepting xn-- labels (in which case it 
        
 
probably simply treats the registration as ASCII and does 
        
 
  not check 
    
 
for 
        
 
IDNA validity.) We may be in agreement here, but I wanted 
        
 
  to further 
    
 
qualify your statement. 
        
 
In table 4.4 of "Recommendation Tables for RN-WG Reports.doc": 
        
 
> For each IDN gTLD proposed, applicant must provide both 
        
 
  the "ASCII 
    
 
> compatible (ACE) form of an IDNA valid string" 
        
 
  ("A-label") and in 
    
 
> local script form (Unicode) of the top level domain ("U-label"). 
        
 
I would also add that the applicant should provide additional 
strings 
        
 
that, after applying IDNA ToASCII operation, result in 
        
 
  the A-label. 
    
 
Additionally, there may also be complications where the U-label 
could be 
        
 
entered into an application using an input method editor 
("keyboard") 
        
 
that may produce a sequence of Unicode characters that may not 
convert 
        
 
to the A-label (either becomes a different A-label or fails 
conversion.) 
        
 
This may be due to user perception that a character is what one 
thinks 
        
 
it is, but when entered using the local input software produces a 
        
 
different character due to locale differences. I will try 
        
 
  to dig up 
    
 
some 
        
 
examples. This is not a technical / policy issue, but is 
        
 
  a usability 
    
 
issue that affects the stability of IDNs. 
        
 
Best, 
        
 
=wil 
        
 
 
 
  
 
  
      
 
  
 
  
  
 
 
 
  
 --
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.413 / Virus Database: 268.18.7/713 - Release Date: 3/7/2007
  
 
 
 
<<<
Chronological Index
>>>    <<<
Thread Index
>>>
 
 |