<<<
Chronological Index
>>> <<<
Thread Index
>>>
Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
- To: Tina Dam <tina.dam@xxxxxxxxx>
- Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
- From: subbiah <subbiah@xxxxxxxxx>
- Date: Wed, 07 Mar 2007 11:04:32 -0800
Dear Tina, Tin Wee, Edmon, Sophia, Sharam, Ram
One point regarding sloppy programming and the internal xn-- in IDN
labels being incorrectly picked up as U strings, leading to spurious
speculative regsitration/activity.
Whether we ban xn-- (and to be super-safe CC--) in the middle of IDN
labels is a decision to be made after the pros and cons are considered.
Personally, either decision is not disastrous on the scale of things.
Let me emphasize this, before I begin.
So, in the interest of a complete examination starting from Sophia's
suggestion and the fact that our working group's job is to think things
through carefully - previous ones did not, which is why we were faced
with ill-motivated registrations of IDN labels beginning with xn-- after
the fact. After almost a decade of thinking about it it would be not
wise to make mistakes again. Here are my pros and cons - others may have
more.
The pros:
First, The number of applications that need to be made IDN-aware in
coming years is vast. Its simply not just a few browsers and a few
email apps and some server-side software for DNS infrastructure.
Virtually every piece of software out there - millions of them probably
- need to be probably made IDN-aware. For example even the free software
Kodak ships with its digital cameras includes photo album software that
understands web-links and highlights/hyper links them and another
example is the business accounting software developed by Bulgarian
programmers for local use in Mongolia (:-)) that is in many ways
web-enabled and links to domains. The point: There are millions of
software already and increasingly many of them being written locally all
over the world at varying levels of ability and sophistication and the
odds that at least one or one percent (which maybe 10 000 such) will
screw up is reasonably high.
Secondly, some day after the migration is over we can remove this ban.
The cons:
It is theoretically possible that restricting just xn-- alone in the
middle of IDN labels could prevent some character in some language from
being registrable in an IDN label. Extending it to cc-- restriction
could theoretically prevent as many as a 1000 characters from some
languages from the universe of millions of Unicode characters from being
registred. A small fraction, but not tothose affected.
However, a closer examination (I have not done an exhaustive examination
yet) suggests that the "--" combination never occurs in an IDN portion
of an IDN label (if someone disagrees please holler). The only time it
happens I think is when an IDN label is composed of two parts - an ASCII
part and a truly IDN part. In this case a direct ASCII registration of
xn-- in the ASCII part (whether at the beginning or in the middle) leads
to something we would like to ban, if thats is what we decide. The other
scenario leading to a banning candidate is when an "xn-" is registered
at the end of the ASCII part of an IDN label that includes both ASCII
component and truly IDN component. I think what happens here, and I
think Tin Wee was trying to point out, is that IDNA conversion ends up
creating an "xn--" (ie. extra "-" appended) that would be in the middle
of the final IDN label. (Of course this would apply for cc-- as well
in super-safe mode). But when one realises that in strict ASCII
domains (not IDN at all) historical rules prevent "-" being registered
at the end of ASCII domains, this second scenario that leads to
bannable candidates is philosophically identical to an existing rule and
should not bring about any "heartaches" but rather only "consistency".
If these are as I preliminarily think the only situations that create
the instances we might want to ban for sloppy programming reasons, than
the theoretical possibility that some characters in some languages could
become unregistrable goes away.
And this leaves the method or rule for enforcing. This would be simply
be - after conversion into final IDN label, extend the current rule of
banning CC-- at the beginning only to be throughout the entire IDN
label. From a programming/implementaion perspective of additional
difficulty - this is almost extremely minor.
Summary
---------
So assuming my initial exploration of consequences are right (lab
testers/people or others can certainly test it) and based on my perhaps
incomplete list of pros/cons, one would think the cons are virtually
nil. While the pros, are not great ( i.e I can see my own self going
along with Tina's view of "who cares about sloppy programmers, if you
are sloppy you deserve it and the market will correct you" ) there are
pros that can be gained for virtually no cons.
As to should we do this or not, that is not my call - I am swayable and
as I said someone else should also think carefully about what I have
said above. Technical things take far more time to think thru carefully
then just a few "email" chains in Internet-time allow for.
Cheers
Subbiah
Tina Dam wrote:
Tin Wee, All,
While it naturally is impossible to test all applications the Technical Test
Phase II is focused on the application area. It is in progress of being
defined and planned and will contain elements around communication as well.
(communication to application providers...so far we have received quite some
interest in getting this right which is good).
Further, the revision of the protocol does not expect to be changing the
prefix and also one of the main reasons for the revision is to be able to
proceed with a non-unicode-version dependant protocol to avoid continuously
revisions, which could create further problems as you mention below.
I am not sure I follow your AXN-- discussion below...but I support Will and
Edmon comments on this. The protocol does not work mid-way strings. What
that means is that it is entirely possible to register a string that midway
has "xn--" in it, and I don't see any need for reserving such names. Sloppy
applications that take such strings and convert to U-strings should quickly
be revised by market complaints. As mentioned while some application testing
is in place, we need to keep in mind that (i) we cant test all applications
that exists now and in the future (ii) even if tested the providers can
change the implementation at any time.
Tina
PS> Sorry I was not on the call last night. I arrived late evening from a
long into LA and did not managed to stay up for the 3am call. I will listen
to the recording and if I have input I will provide it to the list.
-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx
[mailto:owner-gnso-idn-wg@xxxxxxxxx] On Behalf Of Tan Tin Wee
Sent: Tuesday, March 06, 2007 4:06 PM
To: edmon@xxxxxxxxxxx
Cc: 'Shahram Soboutipour'; owner-gnso-idn-wg@xxxxxxxxx;
'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
Ram Mohan wrote:
>> > Are you saying - something like <*CITIBANKchina.TLD*>
where "china" is >> > in local script while CITIBANK is in
Latin script should be banned, >> > because its Punycode
translation would result in an <xn--> midway >> > through
the string?
I agree with the comments made so far.
xn-- in the case mentioned by Ram won't happen in the current
way Punycode works as William and Edmon pointed out.
Having said that, I agree that for the moment we may not want
to add more complication by recommending to split the IDN
label with xn-- embedded inside because xn-- can occur in
punycode within a label like (using Ram's example and
modifying it...) citibankxn-<China> will appear as
xn--citibankxn--b28qq03g (e.g.
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu
t=xn--citibankxn--b28qq03g&type=PUNYCODE
converting from Punycode xn--citibankxn--b28qq03g to Unicode:
citibankxn-?? or use
http://www.afilias.info/cgi-bin/convert_punycode.cgi)
...
which I think was the nub of Shahram's point:
> <CCHH>citibank-<CCHH><encodedCHINA>.tld
Of course, xn-- at the prefix will cause the rest of the
label "citibankxn--b28qq03g" to be processed as such, but still
xn-- as mentioned by Sophia will pop up here and there by
accident or by deliberate design by non-bonafide registrants.
I think what Sophia meant which Ram misunderstood was for
some mechanism to trap xn-- inside labels to ensure that for
instance, it doesn't confuse software programmers with sloppy
programming that picks out xn-- inside an xn-- prefixed
string (non-greedy algorithm) like in the case I mentioned,
and display the wrong IDN label; or that with the mixed
scripts thing, if we don't look carefully in the xn-- or CCHH
issue, if the next Unicode version pops up that is of enough
drastic change, and we need to migrate, and in the process
change xn-- to some other CCHH for example by way of
illustration, we may lose the option if xm-- or xe-- etc was
already registered as axe-?? with xn--axe--3f5fw08b at the
back end encoding or AXN-?? with xn--axn--3f5fw08b which is
a conceivable registration by the AXN satellite channel.
OR in cases of spoofing or passing off by confusing people
with citibankxn-?? and citibank.xn-?? which look pretty
close, that may get punycoded to
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu
t=citibankxn-%E4%B8%AD%E5%9B%BD&type=UTF8
xn--citibankxn--b28qq03g
and
http://mct.verisign-grs.com/conversiontool/convertServlet?inpu
t=citibank.xn-%E4%B8%AD%E5%9B%BD&type=UTF8
citibank.xn--xn--x68dy61b
respectively.
Try these two labels with Affilias converter and the second
one will generate block, while
http://www.nameisp.com/punycode.asp will work just like the
verisign converter... So these are programming variations we
may need to follow though.
If we recommend against AXN-?? because it generates a
potentially confusing xn-- string inside a punycode label,
then AXN?? could be an option, as it will generate
xn--axn-x68dy61b, which is xn- and not xn--.
Finally,
Edmon Chung wrote:
> Nevertheless, with regards to our discussion at hand, I am
quite certain > we have comprehensive protection with the
CCHH reserved as a prefix.
Yes, I suspect this might be the case, but somebody might
want to get a team of programmers to run a check on some test
cases. Does anyone know if this kind of scenario is being
tested at the moment in the ICANN testing contract?
bestrgds
tin wee
Edmon Chung wrote:
Hi Shahram,
There was an extensive discussion in the original IDN protocol
development about the use of the prefix (or suffix or other
possible
identifiers), and finally CCHH was chosen. I highly doubt that we
would be choosing a scheme that would split up a label (for
many good
reasons including bidi and single script considerations) into
different chunks with different prefixes, but no one can
predict the
future I suppose :-)
Nevertheless, with regards to our discussion at hand, I am quite
certain we have comprehensive protection with the CCHH
reserved as a prefix.
Edmon
*From:* owner-gnso-idn-wg@xxxxxxxxx
[mailto:owner-gnso-idn-wg@xxxxxxxxx]
*On Behalf Of *Shahram Soboutipour
*Sent:* Tuesday, March 06, 2007 4:50 PM
*To:* owner-gnso-idn-wg@xxxxxxxxx
*Cc:* 'Sophia Bekele'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
*Subject:* RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
Dear Edmon
Regarding the sample CITIBANKchina.TLD (where china is in Chinese
charset), I think there is a 3^rd possibility which might
be Sophia's idea:
<CCHH>citibank-<CCHH><encodedCHINA>.tld
It means that every separate part of a label in non-ascii
strings be
translated with a CCHH at first. I am not sure if there is
a rule for
this right now or not, but I myself do not agree with this type. I
prefer <CCHH>citibank-<encodedCHINA>.tld cause:
1. I think there is enough space for possible further changes and
developments in IDNA standard in CC part of CCHH, so there
must be no
worries.
2. the CCHH (at first) is a good rule to define an IDN ,
and I think
it can be a rule in all the levels of a url (not only 2^nd
and 3^rd )
but seems higher levels other than 3^rd are out of scope of ICANN's
policy, BUT must be mentioned in their own technical
decision makings.
Regards,
/*Shahram Soboutipour*/ <BLOCKED::mailto:soboutipour@xxxxxxxxxxx>
*President and CEO*
*Karmania Media* <BLOCKED::http://www.karmania.ir/>
Tel: +98 341 2117844,5
Mobile: +98 913 1416626
Fax: +98 341 2117851
-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx
[mailto:owner-gnso-idn-wg@xxxxxxxxx]
On Behalf Of Edmon Chung
Sent: Tuesday, March 06, 2007 6:09 AM
To: 'Tan, William'; rmohan@xxxxxxxxxxxx
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: RE: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
I dont think you are missing anything William.
Was trying to speak up during the call earlier, I dont think the
concern Sophia was articulating should be an issue.
If I am not mistaken, Sophia was asking whether it would be
necessary
to reserve names such as:
abc<CCHH>xyz.tld
These names would NOT be considered IDN nor parts of which IDN, but
are simply ASCII domains. <CCHH> can be best seen as a prefix to
denote that a domain label (i.e. between two dots) has at least one
non LDH
(letter-digit-hyphen) character.
Using the example described:
citibank<CHINA>.tld
where <CHINA> is in Chinese, William's explanation is correct, it
should
become:
<CCHH>citibank-<encodedCHINA>.tld
And NOT
Citibank<CCHH><encodedCHINA>.tld
So, by reserving <CCHH> at the front (i.e. first 4
characters, or more
precisely, hyphens in the third and fourth character> we cover all
cases of intended IDN expressions.
Edmon
-----Original Message-----
From: owner-gnso-idn-wg@xxxxxxxxx
[mailto:owner-gnso-idn-wg@xxxxxxxxx] On
Behalf Of Tan, William
Sent: Tuesday, March 06, 2007 7:47 AM
To: rmohan@xxxxxxxxxxxx
Cc: 'Sophia B'; gnso-idn-wg@xxxxxxxxx; gnso-rn-wg@xxxxxxxxx
Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
Hi all,
I believe the motivations behind banning strings with
hyphens in the
*third *and *fourth *positions are:
1. to protect registries who do not offer IDN registrations from
unknowingly registering IDNs; and
2. to reserve future revisions to the IDNA standard where a
different
prefix might be assigned.
Ram Mohan wrote:
>
> Are you saying - something like <*CITIBANKchina.TLD*> where
"china" is
> in local script while CITIBANK is in Latin script should be
banned,
> because its Punycode translation would result in an
<xn--> midway
> through the string?
>
I'm not sure I follow this. CITIBANKchina.TLD would translate to
xn--citibank-encodedchunk.TLD, so xn-- would not occur
midway in the
ACE
string.
> In general, the rationale for banning "CCHH" at a
position other
than
> the beginning of a string/label is unclear.
I have not seen any documents that suggest banning CCHH
at anything
but
the beginning of a string. Am I missing something?
Sophia said:
> All registrations should
> be in the IDN label, and that the ACE label should be
internal to
the
> operations of the registration. *One should not be offering to
> register xn--.... as a label or any ACE label since it is an
internal
> encoding, so as to prevent confusion and other
malfeasance (phishing)*.
Many registries today use the ACE string at the registration
protocol
level, so your statement would essentially be advising
against that
practice. Personally, I don't think it is a problem unless the
registry
does NOT offer IDN and is accepting xn-- labels (in which case it
probably simply treats the registration as ASCII and does
not check
for
IDNA validity.) We may be in agreement here, but I wanted
to further
qualify your statement.
In table 4.4 of "Recommendation Tables for RN-WG Reports.doc":
> For each IDN gTLD proposed, applicant must provide both
the "ASCII
> compatible (ACE) form of an IDNA valid string"
("A-label") and in
> local script form (Unicode) of the top level domain ("U-label").
I would also add that the applicant should provide additional
strings
that, after applying IDNA ToASCII operation, result in
the A-label.
Additionally, there may also be complications where the U-label
could be
entered into an application using an input method editor
("keyboard")
that may produce a sequence of Unicode characters that may not
convert
to the A-label (either becomes a different A-label or fails
conversion.)
This may be due to user perception that a character is what one
thinks
it is, but when entered using the local input software produces a
different character due to locale differences. I will try
to dig up
some
examples. This is not a technical / policy issue, but is
a usability
issue that affects the stability of IDNs.
Best,
=wil
--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.413 / Virus Database: 268.18.7/713 - Release Date: 3/7/2007
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|