ICANN ICANN Email List Archives

[gtld-guide]


<<< Chronological Index >>>    <<< Thread Index >>>

Feedback from Unicode Technical Committee (UTC) on gTLD Guidebook

  • To: gtld-guide@xxxxxxxxx, "Tina Dam" <tina.dam@xxxxxxxxx>, "Vint Cerf" <vint@xxxxxxxxxx>
  • Subject: Feedback from Unicode Technical Committee (UTC) on gTLD Guidebook
  • From: "Mark Davis" <mark@xxxxxxxxxxxxx>
  • Date: Thu, 13 Nov 2008 17:16:12 -0800

To:gtld-guide@icann.orgFrom: Unicode Technical Committee (UTC)Date:
2008-11-14

The following is approved feedback from the Unicode Technical Committee on
the ICANN document "New gTLD Program: Draft Applicant Guidebook (Draft RFP)"
(http://icann.org/en/topics/new-gtld-draft-rfp-24oct08-en.pdf<http://www.icann.org/en/topics/new-gtld-draft-rfp-24oct08-en.pdf>
). There is a copy of this email on:
http://docs.google.com/Doc?id=dfqr8rd5_358ffvqqhf9.

The structure of the feedback below
includes a citation of text from the document, suggested replacement
text or other changes to remedy the problem, and a rationale for the
change.

------------------------------
2.1.1.3.2 String Requirements
------------------------------

The label must be a valid internationalized domain
name, as specified in the technical standard
Internationalizing Domain Names in Applications
(RFC 3490). This includes the following
nonexhaustive list of limitations:
=>
The label must be a valid internationalized domain name, as specified in the
latest version of the IDNA specifications (see XXX). This includes, but is
not limited to, the following constraints. Note that these are in no way a
complete statement of the requirements of the IDNA specifications.

*Rationale.** Clearer wording, and you *really* don't want the reader to
think that what is listed here is in any way completely whatsoever.
*
------------------------------

- Must consist entirely of characters
directional property.
[DELETE]

*Rationale.** This is completely false. It would disallow many IDNs that are
needed, and allowed by idna-bis-bidi. Note: it is questionable how much of
IDNA2008 this text should repeat, especially in the case of complex
provisions like BIDI. Moreover, "directional property" is undefined.
*
------------------------------

All code points in a single label must be taken
from the same script as determined by the
Unicode Standard Annex #24: Unicode Script
Property.

=>

Labels are subject to a constraint based on the script value of their
characters. All characters in the label that do not have the Common script
value or the Inherited script value must share a single script value. Script
values are determined as specified in the Unicode Standard: see  Unicode
Standard Annex #24: Unicode Script Property.

*Rationale. **The constraint to single scripts is far too narrow. The script
values Common and Inherited are given to characters that are used with
multiple scripts, such as "-" or "2", or Arabic vowels. Forcing such obvious
characters to go through the exception process is needless overhead, and
obscures the exceptional cases.
*
------------------------------
2.1.1.4.1 Requirements for Strings Intended to Represent Geographical
Entities
------------------------------

This includes a representation of the
country or territory name in any of the six official
United Nations languages (French, Spanish,
Chinese, Arabic, Russian and English) and the
country or territory's local language.

=>

This includes a representation of the country or territory name in any of
the six official United Nations languages (French, Spanish, Chinese, Arabic,
Russian and English) and *any of* the country or territory's local language*
s*.

*Rationale. **It is quite common for a country or territory to have more
than one language, so that needs to be accounted for.
*
------------------------------

Applications for any string that represents a subnational
place name, such as a county, province,
or state, listed in the ISO 3166-2 standard.

=>

Applications for any string that represents a subnational place name, such
as a county, province, or state. These could be, for example, as listed in
the ISO 3166-2 standard.

*Rationale. **The ISO 3166-2 standard is not complete, and is not freely
available. Including the comma may imply to the reader that it is required,
that the sentence is to be read as: "Applications for any string that
represents a subnational place name (such as a county, province, or state)
listed in the ISO 3166-2 standard."
*
------------------------------

Applications for a city name, where the applicant
clearly intends to use the gTLD to leverage from the
city name.

*Issue.** City names are *very* ambiguous - look at the number of "Paris"
cities that exist. If Paris, Texas gets there first, what happens? Should
there be some qualification necessary to disambiguate city names instead?
*
------------------------------

1.3 Information for Internationalized Domain Name Applicants

If an applicant applies for such a string, it must provide
accompanying information indicating compliance with
the IDNA protocol and other requirements. The IDNA
protocol is currently under revision and its documentation
can be found at
http://www.icann.org/en/topics/idn/rfcs.htm.

[ADD AFTERWARDS]

This document presumes that the IDNA protocol has been revised in accordance
with the description at http://www.icann.org/en/topics/idn/rfcs.htm, and
makes use of terminology defined in the draft revisions. That revision may
change before approval, and such changes could require corresponding
modifications of the following text.

*Rationale**. It must be made clear to the reader that while we expect the
revision to succeed, the text following this in the document is subject to
change.
*
------------------------------

2. Language of label (ISO 639-1). The applicant will
specify the language of the applied-for TLD string, both
Module 1 Introduction to the gTLD Application Process Draft – For Discussion
Only
1-17 according to the ISO's codes for the representation of
names of languages, and in English.

=>

Language tag of label (according to IETF BCP 47 *Tags for Identifying
Languages*). The applicant will specify the language tab of the applied-for
TLD string, both Module 1 Introduction to the gTLD Application Process Draft
– For Discussion Only 1-17 according to the IETF BCP 47 *Tags for
Identifying Languages*, and in English.

*Rationale**: ISO 639-1 only covers a small fraction of the world's
languages. The correct reference, used in HTML, XML, and all modern
software, is BCP 47.
*
------------------------------

3. Script of label (ISO 15924).The applicant will specify the
script of the applied-for gTLD string, both according to
the ISO code for the presentation of names of scripts,
and in English.

=>

Main script of label (see *2.1.1.3.2 String Requirements*). The applicant
will specify the scripts of the applied-for gTLD string, both according to
the Unicode Script property, and in English.

*Rationale. This brings the text in line with the use of script in 2.1.1.3.2
String Requirements. It also prevents bogus information such as script
variants (Latin Fraktur), which are not properties of characters. The term
"scripts" takes account of the fact that some cases of multiple scripts are
allowed. (Note that this information is competely derivable from the
U-Label.)
*
------------------------------

4. Unicode code points. The applicant will list all the code
points contained in the U-label according to its
Unicode form.
=>

4. Unicode code points. The applicant will list all the codepoints contained
in the U-label according using the U+ notation. For example, for the
label "öbb",
the list would be: "U+00F6 U+0062 U+0062".

*Rationale. **This makes the intent clear. *

------------------------------
5. Representation of label in phonetic alphabet. The
applicant will provide its applied-for gTLD string notated
according to the International Phonetic Alphabet
(http://www.arts.gla.ac.uk/IPA/ipachart.html ).

[DELETE]

*Rationale**. First, it is questionable what the purpose of this is -- how
is it to be used? How would it make a difference in the registration what
the IPA was? Secondly, the same word could have many different IPA readings,
narrow vs broad, or vary greatly by speaker (the same word spoken by a Scot
vs a Chicagoan). Third, very few registrants will be able to supply correct
IPA representations.*

------------------------------

6. Its IDN table. This table provides the list of characters
eligible for registration in domain names according to
registry policy. It will contain any multiple characters
that can be considered "the same" for the purposes of
registrations at the second level. For examples, see
http://iana.org/domains/idn-tables/.

*Question: we think this means a reference to a table rather than a complete
copy. If so, what format should such a reference take, is a link sufficient?
It should be clear exactly what a registrant needs to supply.
*
------------------------------

7. Applicants must further demonstrate that they have
made reasonable efforts to ensure that the encoded
IDN string does not cause any rendering or operational
problems. For example, problems have been identified
in strings with characters of mixed right-to-left and leftto-
right directionality when numerals are adjacent to
the path separator. If an applicant were applying for a
string with known issues, it should document steps that
will be taken to mitigate these issues in applications.

*Question.** It sounds like this is asking the applicant to change all the
program applications that use the domain name, which is clearly impossible.
What would be an example of "reasonable efforts"?
*
------------------------------
2.1.1.1 String Confusion Review
------------------------------
...
The similarity review will be conducted by a panel of String
Similarity Examiners. This examination will be informed by an
algorithmic score for the visual similarity between each
applied-for string and each of other existing and applied-
for TLDs. The score will provide one objective measure for
consideration by the panel.
...
The algorithm uses proprietary software to perform a series of mathematical
calculations to assess the visual similarity between strings based upon the
following parameters:
...

*Issue**. It is inappropriate for ICANN to use an algorithm which is not
public, and not based on public data.*

------------------------------

If the evaluators determine that a string poses stability
issues that require further investigation, the applicant must
either confirm that it intends to move forward with the
application process or withdraw its application.

*Issue**. What is an example of "stability issues" in a string? Should this
be "technical issue"? How is an applicant supposed to know what "stability
issue" means. All terms needs definition, and either before usage or in a
glossary. Currently there is a definition of stability of a "registry
service", is later, at the end of 2.1.3, but no definition or indication of
what "stability issues" are for string?*


<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy