ICANN ICANN Email List Archives

[gnso-idn-wg]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label

  • To: Avri Doria <avri@xxxxxxx>
  • Subject: Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
  • From: "Tan, William" <William.Tan@xxxxxxxxxxx>
  • Date: Fri, 09 Mar 2007 15:18:51 -0500


Avri Doria wrote:
On 9 mar 2007, at 13.37, Tan Tin Wee wrote:
citibankäå

a quick question, doesn't the example involve mixing scripts inside a single label? and isn't this prohibited in all cases even if mixing scripts is allowed within multi label names?

Yes, this is a mix of Latin (ASCII subset) and Han scripts. However, as many on the list have said, for many scripts, ASCII can be mixed in without causing much confusion. It is allowed on the second level in most (careful) IDN implementations, as demonstrated on the language tables published on the IANA IDN Language Table Registry. This has also been explained in the ICANN IDN Guidelines:


   3. (a) In implementing the IDN standards, top-level domain
   registries will *associate each label* in a registered
   internationalized domain name, as it appears in their registry *with
   a single script* This restriction is intended to limit the set of
   permitted characters within a label. If greater specificity is
   needed, the association may be made by combining descriptors for
   both language and script. Alternatively, a label may be associated
   with a set of languages, or with more than one designator under the
   conditions described below. (b) A registry will publish the
   aggregate set of code points that it makes available in clearly
   identified IDN-specific character tables, and will define equivalent
   character variants if registration policies are established on their
   basis. Any such table will be designated in a manner that indicates
   the script(s) and/or language(s) it is intended to support. (c) All
   code points in a single label will be taken from the same script as
   determined by the Unicode Standard Annex #24: Script Names at
   http://www.unicode.org/reports/tr24. *Exception to this is
   permissible for languages with established orthographies and
   conventions that require the commingled use of multiple scripts*. In
   such cases, visually confusable characters from different scripts
   will not be allowed to co-exist in a single set of permissible
   codepoints unless a corresponding policy and character table is
   clearly defined. (d) All registry policies based on these
   considerations will be documented and publicly available, including
   a character table for each permissible set of code points, before
   the registration of any IDN associated with such an aggregate may be
   accepted.


A well-known example of permissible language is Japanese, where one could combine Han, Hiragana, Katakana, and ASCII subset of Latin.
A well-known example of bad-practice would be to allow Cyrillic and Latin to be combined within a single label.



I think there are two issues whenever we discuss the topic of "single-script adherence" and I asked for clarification on the last teleconference. However, I suspect we still have not grounded the discussions on one or the other. To be clear, there are two possible way one can interpret "single-script adherence across all labels":


1. Every label in a domain name string is composed of characters from a single script. However, one label may belong to a different script than another. E.g. ããã.espaÃa - there are two labels with one containing only Katakana and the containing only Latin.

2. All characters in every label of a domain name string is composed of characters from a single script. The example above ããã.espaÃa would be violating this policy. OTOH, ããã.ããããã would be ok since both labels are Katakana.

We need to make it clear in our recommendations if we mean either 1 or 2 above.


#1 above has already been somewhat covered by the ICANN IDN Guidelines. I don't think anyone would argue against this. Whether it could/should be enforced as a contractual requirement for new TLDs is up for discussion.


#2 is what I believe we have been discussing on the call and the list. I am of the view that restrictions should be applied using "SHOULD" language, just so as to discourage abuse. I'm sitting on the fence as far as whether we should enforce it.


Best regards,

=wil

<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy