<<<
Chronological Index
>>> <<<
Thread Index
>>>
Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
- To: edmon@xxxxxxxxxxx
- Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
- From: Tan Tin Wee <tinwee@xxxxxxxxxxxxxx>
- Date: Wed, 07 Mar 2007 08:06:00 +0800
Ram Mohan wrote:
>> > Are you saying â something like <*CITIBANKchina.TLD*> where âchinaâ is
>> > in local script while CITIBANK is in Latin script should be banned,
>> > because its Punycode translation would result in an <xn--> midway
>> > through the string?
I agree with the comments made so far.
xn-- in the case mentioned by Ram won't happen in the
current way Punycode works as William and Edmon pointed out.
Having said that, I agree that for the moment we may not want to
add more complication by recommending to split
the IDN label with xn-- embedded inside
because xn-- can occur in punycode within a label like (using Ram's example
and modifying it...)
citibankxn-<China> will appear as xn--citibankxn--b28qq03g
(e.g.
http://mct.verisign-grs.com/conversiontool/convertServlet?input=xn--citibankxn--b28qq03g&type=PUNYCODE
converting from Punycode xn--citibankxn--b28qq03g
to Unicode: citibankxn-äå or use
http://www.afilias.info/cgi-bin/convert_punycode.cgi)
...
which I think was the nub of Shahram's point:
> <CCHH>citibank-<CCHH><encodedCHINA>.tld
Of course, xn-- at the prefix will cause the rest of the label
"citibankxn--b28qq03g" to be processed as such, but still
xn-- as mentioned by Sophia will pop up here and there by
accident or by deliberate design by non-bonafide registrants.
I think what Sophia meant which Ram misunderstood was for
some mechanism to trap xn-- inside labels to ensure
that for instance, it doesn't confuse software programmers with sloppy
programming that picks out xn-- inside an xn-- prefixed
string (non-greedy algorithm) like in the case I mentioned,
and display the wrong IDN label; or that with the mixed
scripts thing, if we don't look carefully in the xn-- or CCHH
issue, if the next Unicode version pops up that
is of enough drastic change, and we need to migrate,
and in the process change xn-- to some other CCHH
for example by way of illustration, we may lose
the option if xm-- or xe-- etc was already registered as
axe-äå with xn--axe--3f5fw08b at the back end encoding
or AXN-äå with xn--axn--3f5fw08b which is a conceivable
registration by the AXN satellite channel.
OR in cases of spoofing or passing off by confusing people with
citibankxn-äå and citibank.xn-äå which
look pretty close, that may get punycoded to
http://mct.verisign-grs.com/conversiontool/convertServlet?input=citibankxn-%E4%B8%AD%E5%9B%BD&type=UTF8
xn--citibankxn--b28qq03g
and
http://mct.verisign-grs.com/conversiontool/convertServlet?input=citibank.xn-%E4%B8%AD%E5%9B%BD&type=UTF8
citibank.xn--xn--x68dy61b
respectively.
Try these two labels with Affilias converter and the second one
will generate block,
while http://www.nameisp.com/punycode.asp will work just like the
verisign converter... So these are programming variations
we may need to follow though.
If we recommend against AXN-äå because it generates a potentially
confusing xn-- string inside a punycode label, then AXNäå could
be an option, as it will generate xn--axn-x68dy61b, which is xn-
and not xn--.
Finally,
Edmon Chung wrote:
> Nevertheless, with regards to our discussion at hand, I am quite certain
> we have comprehensive protection with the CCHH reserved as a prefix.
Yes, I suspect this might be the case, but somebody might want to get a
team of programmers to run a check on some test cases. Does anyone know if
this kind of scenario is being tested at the moment in
the ICANN testing contract?
bestrgds
tin wee
Edmon Chung wrote:
Hi Shahram,
There was an extensive discussion in the original IDN protocol
development about the use of the prefix (or suffix or other possible
identifiers), and finally CCHH was chosen. I highly doubt that we would
be choosing a scheme that would split up a label (for many good reasons
including bidi and single script considerations) into different chunks
with different prefixes, but no one can predict the future I suppose :-)
Nevertheless, with regards to our discussion at hand, I am quite certain
we have comprehensive protection with the CCHH reserved as a prefix.
Edmon
*From:* owner-gnso-idn-wg@xxxxxxxxx [mailto:owner-gnso-idn-wg@xxxxxxxxx]
*On Behalf Of *Shahram Soboutipour
*Sent:* Tuesday, March 06, 2007 4:50 PM
*To:* owner-gnso-idn-wg@xxxxxxxxx
*Cc:* 'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
*Subject:* RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
Dear Edmon
Regarding the sample CITIBANKchina.TLD (where china is in Chinese
charset), I think there is a 3^rd possibility which might be Sophiaâs idea:
<CCHH>citibank-<CCHH><encodedCHINA>.tld
It means that every separate part of a label in non-ascii strings be
translated with a CCHH at first. I am not sure if there is a rule for
this right now or not, but I myself do not agree with this type. I
prefer <CCHH>citibank-<encodedCHINA>.tld cause:
1. I think there is enough space for possible further changes and
developments in IDNA standard in CC part of CCHH, so there must be no
worries.
2. the CCHH (at first) is a good rule to define an IDN , and I think it
can be a rule in all the levels of a url (not only 2^nd and 3^rd ) but
seems higher levels other than 3^rd are out of scope of ICANNâs policy,
BUT must be mentioned in their own technical decision makings.
Regards,
/*Shahram Soboutipour*/ <BLOCKED::mailto:soboutipour@xxxxxxxxxxx>
*President and CEO*
*Karmania Media* <BLOCKED::http://www.karmania.ir/>
Tel: +98 341 2117844,5
Mobile: +98 913 1416626
Fax: +98 341 2117851
-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx [mailto:owner-gnso-idn-wg@xxxxxxxxx]
On Behalf Of Edmon Chung
Sent: Tuesday, March 06, 2007 6:09 AM
To: 'Tan, William'; rmohan@xxxxxxxxxxxx
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
I dont think you are missing anything William.
Was trying to speak up during the call earlier, I dont think the concern
Sophia was articulating should be an issue.
If I am not mistaken, Sophia was asking whether it would be necessary to
reserve names such as:
abc<CCHH>xyz.tld
These names would NOT be considered IDN nor parts of which IDN, but are
simply ASCII domains. <CCHH> can be best seen as a prefix to denote
that a domain label (i.e. between two dots) has at least one non LDH
(letter-digit-hyphen) character.
Using the example described:
citibank<CHINA>.tld
where <CHINA> is in Chinese, William's explanation is correct, it should
become:
<CCHH>citibank-<encodedCHINA>.tld
And NOT
Citibank<CCHH><encodedCHINA>.tld
So, by reserving <CCHH> at the front (i.e. first 4 characters, or more
precisely, hyphens in the third and fourth character> we cover all cases
of intended IDN expressions.
Edmon
-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx [mailto:owner-gnso-idn-wg@xxxxxxxxx] On
Behalf Of Tan, William
Sent: Tuesday, March 06, 2007 7:47 AM
To: rmohan@xxxxxxxxxxxx
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
Hi all,
I believe the motivations behind banning strings with hyphens in the
*third *and *fourth *positions are:
1. to protect registries who do not offer IDN registrations from
unknowingly registering IDNs; and
2. to reserve future revisions to the IDNA standard where a different
prefix might be assigned.
Ram Mohan wrote:
>
> Are you saying â something like <*CITIBANKchina.TLD*> where âchinaâ is
> in local script while CITIBANK is in Latin script should be banned,
> because its Punycode translation would result in an <xn--> midway
> through the string?
>
I'm not sure I follow this. CITIBANKchina.TLD would translate to
xn--citibank-encodedchunk.TLD, so xn-- would not occur midway in the ACE
string.
> In general, the rationale for banning âCCHHâ at a position other than
> the beginning of a string/label is unclear.
I have not seen any documents that suggest banning CCHH at anything but
the beginning of a string. Am I missing something?
Sophia said:
> All registrations should
> be in the IDN label, and that the ACE label should be internal to the
> operations of the registration. *One should not be offering to
> register xn--.... as a label or any ACE label since it is an internal
> encoding, so as to prevent confusion and other malfeasance (phishing)*.
Many registries today use the ACE string at the registration protocol
level, so your statement would essentially be advising against that
practice. Personally, I don't think it is a problem unless the registry
does NOT offer IDN and is accepting xn-- labels (in which case it
probably simply treats the registration as ASCII and does not check for
IDNA validity.) We may be in agreement here, but I wanted to further
qualify your statement.
In table 4.4 of "Recommendation Tables for RN-WG Reports.doc":
> For each IDN gTLD proposed, applicant must provide both the "ASCII
> compatible (ACE) form of an IDNA valid string" (âA-labelâ) and in
> local script form (Unicode) of the top level domain (âU-labelâ).
I would also add that the applicant should provide additional strings
that, after applying IDNA ToASCII operation, result in the A-label.
Additionally, there may also be complications where the U-label could be
entered into an application using an input method editor ("keyboard")
that may produce a sequence of Unicode characters that may not convert
to the A-label (either becomes a different A-label or fails conversion.)
This may be due to user perception that a character is what one thinks
it is, but when entered using the local input software produces a
different character due to locale differences. I will try to dig up some
examples. This is not a technical / policy issue, but is a usability
issue that affects the stability of IDNs.
Best,
=wil
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|