ICANN ICANN Email List Archives

[ft-implementation]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: Public Comment: Fast Track Proposed Solutions

  • To: ft-implementation@xxxxxxxxx
  • Subject: Re: Public Comment: Fast Track Proposed Solutions
  • From: Eric Brunner-Williams <ebw@xxxxxxxxxxxxxxxxxxxx>
  • Date: Fri, 20 Feb 2009 13:59:11 -0500

The following are some text change suggestions and supporting rhetoric (or rationale if the reader is feeling generous) for the suggested
changes to the recent Proposed Solutions.

Page 2, Section I

The first bullet item could be qualified, the IDN Table is not a tabular listing of all characters available for some purpose, but a listing of all characters, other than the ASCII LDH set, from some character repertoire other than the ASCII set. Mentioning Unicode as the character repertoire would be useful too.

The fourth bullet is incorrect. It places the utility for definition of variant characters in typographic similarities. The set of typographically dissimilar characters 'that have "the same meaning" when used in domain name registrations' is vastly greater than the set of typographically similar characters. The utility for recognizing equivalent meanings, e.g., between typographically dissimilar Simplified Chinese and Tradition Chinese characters, and for typographically dissimilar Abenaki orthographic convention equivalence class {8, w, ou, and U+0222, U+0223}, is not to prevent confusion arising from dissimilar meaning associated with visually similar characters, but to prevent confusion arising from dissimilar meaning associated with visually dissimilar characters.

This error is repeated in the expanded discussion in page 4, Section III, which states the benefits of equivalence only in terms of visual similarity. This is equivalent to claiming that the only motivation for exercising any care with any subset of the character repertoire(s), or for IDNs in the first place, is typo-squatting on famous marks.

The test for true meaning is either to state that SC/TC and similar character equivalencies are not "variants" and ICANN will discard any SC/TC and similar table data submitted as IDN table data.

Page 3, Section I, continued

The sixth bullet is unclear. What support will ICANN provide to applicants when requested?

The seventh bullet could be restated as "If a sequence of characters applied for by a Fast Track Program applicant ccTLD, or any subsequent applicant, contains any characters which are defined as being in a variation set in any IDN table previously submitted to ICANN, the application would result in an entry into the IANA root only if an undefined "user confusion" test has an under-defined outcome."

Page 4, Section IV, para 1 refers to a "speech community". This should be "writing community", or simply "language community", and "language authority or authorities" is even better. The plural is probably the better choice, I couldn't abide by George W. Bush's "English" and I'm sure the US isn't the only odd-stakeholder that is intellectually broken from time to time, or permanently on language authority issues.

For instance, in Khmer there are several orthographic conventions (see also Abenaki, supra) and the character variation issue has nothing to do with spoken Khmer, only the orthographic choice problem.

Para 3 mentions the ASIWG in a context similar to the CDNC/JET work that resulted in rfc3743. I participate in the ASIWG and unfortunately there is no IETF draft to point to which incorporates the script-development-based policy serving the Arabic, Farsi, Urdu, ... language authorities (or "language communities").

The final para of page 4, continued on page 5, makes no mention of the Arabic Script use by the diaspora communities in Europe and the Americas. While the Yiddish language (Hebrew Script) is not mentioned, it is an example of a language authority (YIVO) existing in diaspora.

The final para before the unnumbered subsection "Usage of IDN Tables and variant characters in domain name registrations" has the following:

"Regardless of the language or script basis, domain names do not always represent [words] ..."

This fails to state that domain names are persistent identifiers associated via resolvers using the DNS (rfc1034/35 et seq) to transient resources, in particular an IPv4 or IPv6 address of a network attached device which may have many IPv4 or IPv6 addresses, in sequence or simultaneously, or both. There is a problem that many, for instance, the group organized three years ago by the Arab League, view "domain names" as "names" existing in some external universe of "meaningful names", rather than as unique LDH (and now over a larger character reperatoire) character sequences slightly more memorable and potentially more persistent than IPv4 and/or IPv6 dotted decimal addresses.

This misconception leads to over-specification attempts and suggesting that identifiers nearly always represent words simply props up a misunderstanding that never should have existed, and that is the level of misunderstanding that exists in some of the groups attempting to inform or capture ICANN and the IETF, though not the UTC which has a higher entry clue barrier than either ICANN or the IETF's IDNAv2 WG.

Section IV, continued on page 6, provides a "primary goal" that "... all language communities have an equal opportunity ... " for the proposal that follows. This is misleading in two parts. First, the Fast Track is not open to "all language communities", it is only open to those (a) which are state sponsored, by some state, and (b) which use a script other than Latin. Second, the goal is not that some language community has formal equity of opportunity with the early-adopter Latin-centric user community of the United States, the 53 member states of the Commonwealth and the Western Europe NATO states, that is, with ASCII, rather it is that the non-Latin scripts are preferentially available to language communities who's orthographic conventions have been included in the current version of the reference character repertoire. We're trying to make it so that Maylay writers using Jawi Script need never again use Latin characters to form identifiers mediated by the DNS, unless they choose to, whether the namespace Maylay writers using Jawi Script is managed by a national operator, or a non-national operator.


The Proposed IDN Table usage for TLD Registrations, beginning on page 6, penultimate paragraph, refers to "variant strings" which is not defined or is a roundabout way to refer to the SC/TC mapping problem, and continues to mingle a technical issue, the lack of a standard mechanism for aliasing delegations at the root of a DNS tree, and the policy choice to limit the number of delegations to "one per language per script per IDN ccTLD. Note that the one per policy is not deleterious to users, in the unlikely possibility that the state associated with .il request both Hebrew and Yiddish (two languages using the same script), but is deleterious to users in the more likely possibility that the states using Chinese and "unified Han" script seek both an SC and a TC entry in the IANA root.

Continuing on page 7, the lettered recommendations for proposed IDN table usage at (c) differs from the IDNC Final Report recommendation of “one string per territory per official language” in the area of variant strings. This is an improvement over the IDNC Final Report, which was marred by an excessively narrow definition of linguistic diversity by members from linguistically homogeneous (in fact or in official fiction) stakeholders. The IDNC recommendation is simply incomprehensible in South Asia and should not have survived without improvement the ICANN Delhi meeting.

Recommendation (d) is perplexing. No standard mechanism exists for aliasing delegations in the IANA root, see the preamble to the lettered recommendations, supra, yet the applicant for a variant string must agree to (somehow) effect an identical zone to the zone(s) associated with all other strings in the "variant string set".

As policy, this is also remarkable. As a hypothetical, suppose that (somehow) there were applications for "US" in both Traditional Chinese and Simplified Chinese. While most multi-character sequences in written Chinese have equivalent meaning despite being visually dissimilar when rendered in SC or TC, the meaning of "mei guo" to multi-generational TC users is unlikely to be indistinguishable to recent immigrant SC users. The requirement that SC and TC be (mechanism ignored) recursively indistinguishable zones is an over-reaching application of the principle that character-by-character there exists equivalencies between the SC and TC character repertoires.

Recommendation (e) is a restatement of the reality that there is no alias mechanism available, but it misses a larger opportunity, that more than the incumbent ASCII ccTLD operator, and more than one policy model, are possible for countries with multiple scripts, languages, or even just variations with a difference such as the US-based SC/TC example. It is unfortunate but there are many ccTLDs in which destruction of minority languages is state policy. The US will not allow an SC or a TC variant string for "mei guo" to entered into the IANA root, though Chinese is the 3rd largest language community in the United States. The same situation exists in Canada. Further, common to both, neither state has applied for the only non-Latin scripts which exist as the official scripts of government, nor is likely to -- this refers to Northern and Cherokee Syllabics.

While the problem of language suppression by Latin, and non-Latin Script using governments may not easily fall within the white picket fence of the IDN ccTLD Fast Track Program, it is a problem that ICANN's IDN program must address, or consent to assisting nation states extinguish existing languages.

I remain concerned that the Twomey invite letter allowed responding ASCII ccTLD operators and governments to disclose their script an string preferences, and alternatively, to keep their preferences confidential. The letter did not state that all responses would be held confidential, regardless of the desires of the responding ASCII ccTLD operators and governments, and that the disclosure of strings sought, and the resulting issues, such as the possibility of zero-width joiners or zero-width non-joiners, is still just guesswork.

Eric Brunner-Williams
in a personal capacity









<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy