ICANN ICANN Email List Archives

[gtld-guide]


<<< Chronological Index >>>    <<< Thread Index >>>

RE: Feedback from Unicode Technical Committee (UTC) on gTLD Guidebook

  • To: Mark Davis <mark@xxxxxxxxxxxxx>, "gtld-guide@xxxxxxxxx" <gtld-guide@xxxxxxxxx>, Vint Cerf <vint@xxxxxxxxxx>
  • Subject: RE: Feedback from Unicode Technical Committee (UTC) on gTLD Guidebook
  • From: Tina Dam <tina.dam@xxxxxxxxx>
  • Date: Tue, 18 Nov 2008 10:06:06 -0800

Hi Mark, Thanks for the review details. In the next review of the string 
criteria the UTC comments will be taken into consideration. Meanwhile please 
note, that the listed string criteria is listed as yet not complete due to the 
ongoing IDNA protocol revision. As such additional adjustments are anticipated. 
However, that said, should the protocol revision not be completed in time for 
the launch of the gTLD Program, then additional restrictions may be placed on 
what ICANN will allow in TLDs.  It is a first draft of some of these 
restrictions you see below and in some cases find to not be matching completely 
with the IDNA protocol proposal as it stands today.


On the topic of IDN Tables, more work is clearly necessary, but this is a 
reference to the character table that the IDN Guidelines are requiring. It is 
basically a table of the characters that a registry will allow for use as SLD 
registration. The format for it can be found at the IANA repository for such 
tables – although this might be updated prior to the launch of new gTLDs.

On the topic of phonetic alphabet, I agree with you. I also find it hard to see 
how this can be done accurately event.

On the topic of string confusion – the algorithm is a tool to guide the panel 
that will be making the determination. The intent is to continue building on 
the algorithm to see if this is something that can be made stronger and of more 
value looking forward. But initially it is used in guiding. It is build by 
linguistic experts, with input from the various script/language communities 
when such scripts/languages are being added. A beta has been made available 
online.

As to the remaining of the UTC comments, I will leave those to my colleagues to 
get back to you on.

Best Regards,
Tina


Tina Dam
Director, IDN Program
ICANN

Cell: +1-310-862-2026
Office: +1-310-301-5838


From: mark.edward.davis@xxxxxxxxx [mailto:mark.edward.davis@xxxxxxxxx] On 
Behalf Of Mark Davis
Sent: Thursday, November 13, 2008 5:16 PM
To: gtld-guide@xxxxxxxxx; Tina Dam; Vint Cerf
Subject: Feedback from Unicode Technical Committee (UTC) on gTLD Guidebook

To:

gtld-guide@xxxxxxxxx<mailto:gtld-guide@xxxxxxxxx>

From:

Unicode Technical Committee (UTC)

Date:

2008-11-14


The following is approved feedback from the Unicode Technical Committee on the 
ICANN document "New gTLD Program: Draft Applicant Guidebook (Draft RFP)" 
(http://icann.org/en/topics/new-gtld-draft-rfp-24oct08-en.pdf<http://www.icann.org/en/topics/new-gtld-draft-rfp-24oct08-en.pdf>).
 There is a copy of this email on: 
http://docs.google.com/Doc?id=dfqr8rd5_358ffvqqhf9.

The structure of the feedback below includes a citation of text from the 
document, suggested replacement text or other changes to remedy the problem, 
and a rationale for the change.

________________________________
2.1.1.3.2 String Requirements
________________________________

The label must be a valid internationalized domain
name, as specified in the technical standard
Internationalizing Domain Names in Applications
(RFC 3490). This includes the following
nonexhaustive list of limitations:
=>
The label must be a valid internationalized domain name, as specified in the 
latest version of the IDNA specifications (see XXX). This includes, but is not 
limited to, the following constraints. Note that these are in no way a complete 
statement of the requirements of the IDNA specifications.

Rationale. Clearer wording, and you *really* don't want the reader to think 
that what is listed here is in any way completely whatsoever.
________________________________

- Must consist entirely of characters
directional property.
[DELETE]

Rationale. This is completely false. It would disallow many IDNs that are 
needed, and allowed by idna-bis-bidi. Note: it is questionable how much of 
IDNA2008 this text should repeat, especially in the case of complex provisions 
like BIDI. Moreover, "directional property" is undefined.
________________________________

All code points in a single label must be taken
from the same script as determined by the
Unicode Standard Annex #24: Unicode Script
Property.

=>

Labels are subject to a constraint based on the script value of their 
characters. All characters in the label that do not have the Common script 
value or the Inherited script value must share a single script value. Script 
values are determined as specified in the Unicode Standard: see  Unicode 
Standard Annex #24: Unicode Script Property.

Rationale. The constraint to single scripts is far too narrow. The script 
values Common and Inherited are given to characters that are used with multiple 
scripts, such as "-" or "2", or Arabic vowels. Forcing such obvious characters 
to go through the exception process is needless overhead, and obscures the 
exceptional cases.
________________________________
2.1.1.4.1 Requirements for Strings Intended to Represent Geographical Entities
________________________________

This includes a representation of the
country or territory name in any of the six official
United Nations languages (French, Spanish,
Chinese, Arabic, Russian and English) and the
country or territory's local language.

=>

This includes a representation of the country or territory name in any of the 
six official United Nations languages (French, Spanish, Chinese, Arabic, 
Russian and English) and any of the country or territory's local languages.

Rationale. It is quite common for a country or territory to have more than one 
language, so that needs to be accounted for.
________________________________

Applications for any string that represents a subnational
place name, such as a county, province,
or state, listed in the ISO 3166-2 standard.

=>

Applications for any string that represents a subnational place name, such as a 
county, province, or state. These could be, for example, as listed in the ISO 
3166-2 standard.

Rationale. The ISO 3166-2 standard is not complete, and is not freely 
available. Including the comma may imply to the reader that it is required, 
that the sentence is to be read as: "Applications for any string that 
represents a subnational place name (such as a county, province, or state) 
listed in the ISO 3166-2 standard."
________________________________

Applications for a city name, where the applicant
clearly intends to use the gTLD to leverage from the
city name.

Issue. City names are *very* ambiguous - look at the number of "Paris" cities 
that exist. If Paris, Texas gets there first, what happens? Should there be 
some qualification necessary to disambiguate city names instead?
________________________________

1.3 Information for Internationalized Domain Name Applicants

If an applicant applies for such a string, it must provide
accompanying information indicating compliance with
the IDNA protocol and other requirements. The IDNA
protocol is currently under revision and its documentation
can be found at
http://www.icann.org/en/topics/idn/rfcs.htm.

[ADD AFTERWARDS]

This document presumes that the IDNA protocol has been revised in accordance 
with the description at http://www.icann.org/en/topics/idn/rfcs.htm, and makes 
use of terminology defined in the draft revisions. That revision may change 
before approval, and such changes could require corresponding modifications of 
the following text.

Rationale. It must be made clear to the reader that while we expect the 
revision to succeed, the text following this in the document is subject to 
change.
________________________________

2. Language of label (ISO 639-1). The applicant will
specify the language of the applied-for TLD string, both
Module 1 Introduction to the gTLD Application Process Draft – For Discussion 
Only
1-17 according to the ISO's codes for the representation of
names of languages, and in English.

=>

Language tag of label (according to IETF BCP 47 Tags for Identifying 
Languages). The applicant will specify the language tab of the applied-for TLD 
string, both Module 1 Introduction to the gTLD Application Process Draft – For 
Discussion Only 1-17 according to the IETF BCP 47 Tags for Identifying 
Languages, and in English.

Rationale: ISO 639-1 only covers a small fraction of the world's languages. The 
correct reference, used in HTML, XML, and all modern software, is BCP 47.
________________________________

3. Script of label (ISO 15924).The applicant will specify the
script of the applied-for gTLD string, both according to
the ISO code for the presentation of names of scripts,
and in English.

=>

Main script of label (see 2.1.1.3.2 String Requirements). The applicant will 
specify the scripts of the applied-for gTLD string, both according to the 
Unicode Script property, and in English.

Rationale. This brings the text in line with the use of script in 2.1.1.3.2 
String Requirements. It also prevents bogus information such as script variants 
(Latin Fraktur), which are not properties of characters. The term "scripts" 
takes account of the fact that some cases of multiple scripts are allowed. 
(Note that this information is competely derivable from the U-Label.)
________________________________

4. Unicode code points. The applicant will list all the code
points contained in the U-label according to its
Unicode form.

=>

4. Unicode code points. The applicant will list all the codepoints contained in 
the U-label according using the U+ notation. For example, for the label "öbb", 
the list would be: "U+00F6 U+0062 U+0062".

Rationale. This makes the intent clear.

________________________________
5. Representation of label in phonetic alphabet. The
applicant will provide its applied-for gTLD string notated
according to the International Phonetic Alphabet
(http://www.arts.gla.ac.uk/IPA/ipachart.html ).

[DELETE]

Rationale. First, it is questionable what the purpose of this is -- how is it 
to be used? How would it make a difference in the registration what the IPA 
was? Secondly, the same word could have many different IPA readings, narrow vs 
broad, or vary greatly by speaker (the same word spoken by a Scot vs a 
Chicagoan). Third, very few registrants will be able to supply correct IPA 
representations.
________________________________

6. Its IDN table. This table provides the list of characters
eligible for registration in domain names according to
registry policy. It will contain any multiple characters
that can be considered "the same" for the purposes of
registrations at the second level. For examples, see
http://iana.org/domains/idn-tables/.

Question: we think this means a reference to a table rather than a complete 
copy. If so, what format should such a reference take, is a link sufficient? It 
should be clear exactly what a registrant needs to supply.
________________________________

7. Applicants must further demonstrate that they have
made reasonable efforts to ensure that the encoded
IDN string does not cause any rendering or operational
problems. For example, problems have been identified
in strings with characters of mixed right-to-left and leftto-
right directionality when numerals are adjacent to
the path separator. If an applicant were applying for a
string with known issues, it should document steps that
will be taken to mitigate these issues in applications.

Question. It sounds like this is asking the applicant to change all the program 
applications that use the domain name, which is clearly impossible. What would 
be an example of "reasonable efforts"?
________________________________
2.1.1.1<http://2.1.1.1> String Confusion Review
________________________________
...
The similarity review will be conducted by a panel of String
Similarity Examiners. This examination will be informed by an
algorithmic score for the visual similarity between each
applied-for string and each of other existing and applied-
for TLDs. The score will provide one objective measure for
consideration by the panel.
...
The algorithm uses proprietary software to perform a series of mathematical 
calculations to assess the visual similarity between strings based upon the 
following parameters:
...

Issue. It is inappropriate for ICANN to use an algorithm which is not public, 
and not based on public data.
________________________________

If the evaluators determine that a string poses stability
issues that require further investigation, the applicant must
either confirm that it intends to move forward with the
application process or withdraw its application.

Issue. What is an example of "stability issues" in a string? Should this be 
"technical issue"? How is an applicant supposed to know what "stability issue" 
means. All terms needs definition, and either before usage or in a glossary. 
Currently there is a definition of stability of a "registry service", is later, 
at the end of 2.1.3, but no definition or indication of what "stability issues" 
are for string?


<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy