<<<
Chronological Index
>>> <<<
Thread Index
>>>
Some comments on the ccTLD Fast Track model and specifications
- To: fast-track-review-2010@xxxxxxxxx
- Subject: Some comments on the ccTLD Fast Track model and specifications
- From: John C Klensin <klensin@xxxxxxx>
- Date: Fri, 17 Dec 2010 15:39:13 -0500
Subject: Some comments on the Fast Track program
Hi.
One could comment about many aspects of this program. In the
interest of relative brevity and in recognition of the fact
that it was designed as an "interim" and relatively short-term
mechanism, I'm going to try to confine myself to high points.
With the exception of the first (which has applicability to
most of the others) and last (which is the most difficult
operationally), these are in no particular order.
While these comments are addressed to the Fast Track, aspects
of several of them also apply to the IDN components of the
proposed gTLD application process and should be read in that
light.
(1) Duration.
The Fast Track was designed as an interim mechanism to cover
the cases that are presumably easy. Many choices that are
plausible given those assumptions become much less so as it
extends into its second year and beyond. I suggest that many
(perhaps most) of the issues raised below could be better
resolved by swiftly shutting down the Fast Track and replacing
it by a permanent mechanism in which those issues (and others)
are addressed. If that cannot be accomplished, these issues
will probably need to be addressed twice -- once for the Fast
Track and once for the more permanent mechanisms. Failure to
address them for a continuing Fast Track will, in my opinion,
lead to a rising number of decisions that are questionable from
one point of view or another, potentially leading to conflicts
among stakeholders that are hard to resolve in a
generally-satisfactory way.
(2) Latin exclusion.
The extended "Latin" (Roman-derived) script is used to write
more different languages than any other. For some of those
languages, including the European ones that may be most
familiar to ICANN leadership, a typical string consists of a
majority of Basic Latin ("undecorated") characters and a much
smaller number of Extended Latin ones (both "decorated"
characters and characters that don't obviously derive from any
thing written in Latin during the Roman Republic or Empire).
Many short, mnemonic, strings can be written without decorated
characters at all. But there are other languages whose writing
systems are based on Latin characters but for which almost no
strings that might be used as plausible mnemonics can be
written without some decorated characters. The "no
Latin-script characters" restriction of the Fast Track may have
been reasonable for some short period of time. At this point,
however, it is becoming a significant bar to equitable and
balanced internationalization of thd DNS.
Unfortunately, it is not plausible to simply permit strings
containing extended Latin characters without addressing other
issues. In the writing system of some languages, the decorated
forms of characters are optional. If the practical definition
of "confusingly similar" is that users of the relevant language
will expect two strings to match, then for users of that
language, any label containing decorated characters immediately
becomes confusingly similar to a string without decoration.
Conversely, the writing systems of other languages treat
decorated characters as fully distinct from the undecorated
forms. For users of those languages, a decorated and
undecorated pair of strings are not likely to be assumed to
match and there is no confusion. But the characters themselves
may be the same for those two language groups, so we would have
strings that are confusingly similar for one language
population and not similar at all for another. A model for
handling this should be sorted out before Latin-character
strings (or strings in any other extended script with similar
issues -- and such scripts do exist-- not left to case-by-case
decisions that are likely to be inconsistent and thereby create
confusion just as a consequence of the inconsistency.
(2) Application quality.
As we increase the number of non-ASCII domain names, the
potential for confusion by people not familiar with the
relevant scripts will rise. That is inevitable and nothing can
be done to avoid it. On the other hand, it is critical that
applications for TLDs (and, by extension, applications for SLDs
within new or old TLDs) be absolutely unambiguous with regard
to what is being requested and that relevant portions of the
information provided should be able to be directly transposed
into IANA registries and other databases without fear of
information loss or information distortion. In theory, the
Fast Track application process should assure that level of
clarity.
In practice, creating applications that adhere to both the
letter and intent of the process has sometimes proven
difficult. So as to prevent putting applicants with slightly
less technical skill from being put at a disadvantage, ICANN
may need to provide improved tools and/or tutorials to permit
creating and verifying conforming applications or may need
clear and transparent rules about the degree to which staff are
permitted or encouraged to help perfect applications. However,
having any part of the evaluation process, including ICANN
Staff, try to figure out what applications are really intended
to mean should be acceptable to no one, if only because it will
sooner or later lead to disasters when the evaluators guess
wrong or to claims of unfair treatment when some applicants are
helped more than others. Applications that are incomplete or
inadequate in any way should be returned to the applicant for
updating.
(3) IDN tables.
These tables were originally intended simply as an announcement
by a TLD about what Unicode characters it intended to accept in
registrations and, later (and primarily for registries
accepting registrations in more than one script, or more than
one in addition to Basic Latin), about what Unicode characters
it intended to accept in combination. The JET work described
in RFC 3743 and the subsequent CDNC work described in RFC 4713
expanded that principle to include tables of characters and
their variants (see below), but the principle remained that the
listing was simply a statement by a registry about its rules
about what it intended to accept. It is perhaps useful to
think about that model as relatively more about transparency of
registry requirements than as about normative decisions that
should affect or bind other registries or represent
broadly authoritative statements about languages, writing
systems, or script usage.
While it is unwise to talk about IDNs except in terms of
characters and perhaps scripts (see below), these tables
typically require knowledge of the characters used to write
specific languages (often a subset of the characters considered
part of a script) in order to avoid confusion and attacks
within the language or script.
Almost since the registry was established, there have been
efforts to make it normative or treat it as normative. While
the temptation and desire are obvious, ICANN needs to remember
that it is common for languages to be written slightly
differently (even with different mixes of characters) in
different locations. An attempt to determine authoritatively
how languages are written on a global basis is, at best, an
invitation to delay and embarrassment. Of course, if language
or script groups want to work out their own systems or lists and
convince registries to use them, that could be very helpful. But
it is not an activity in which ICANN should engage or even, in
my opinion, actively try to foster.
(4) Confusability.
The Fast Track was defined in a way that seemed intended to
make evaluation for confusability of characters relatively
simple for the first several applications. In particular,
comparisons for similar characters were to consider only
relationships to existing TLDs and to ASCII. As the number of
approved and delegated domains increases, comparisons with all
such domains will inevitably become both more complicated and
more subjective: for example, it is worth remembering that
almost every script has characters that consist exclusively of
vertical, horizontal, or slanted lines: not having such
characters would be an archaeological surprise. Those who want
to understand this issue better should consider the ASCII "l"
(lower-case L) and "/" (forward slash) characters. The latter
is not permitted in an LDH or IDN domain name, but whether the
two characters can be confused or not depends on what one knows
(or thinks one knows) about the importance of angles and serifs
or other decorations (if they are present). If the Fast Track
process is to continue for an extended period, unless the
evaluation process has additional clear advice from the
community that lead to consistent rules about how to handle
these cases, disagreements are almost inevitable.
(5) Promises.
Especially for the case of "variants" (see below), the Fast
Track model seems to make a commitment that allocations of
"variant" names will eventually occur and will occur according
to some new technical DNS-based mechanism. That commitment is
simply not realistic given what we know about how to DNS does
lookups and handles caching, propagation, and aliases.
Certainly special cases with additional script-specific
constraints may be feasible and appropriate, but ICANN
procedures like the Fast Track should not be making promises
that no one knows how to realize or implement, especially since
applicants may make decisions based on those promises.
(6) Scripts and families of scripts -- micro-level.
The Unicode list of scripts and the characters that go with
them work well in many Unicode-related contexts. Given that
IDNA is based on Unicode, any attempt to develop or work with a
different list would be bound to be controversial and might
even be problematic technically. However, if one needs to
deal with the practicalities of user expectations and human
perceptions, it is important to note that other classifications
are possible and, indeed, that the Unicode divisions are fairly
arbitrary in some areas and that arbitrariness should be
considered more carefully with the Fast Track and its
successors than has been the case so far. Some examples may be
helpful:
6.1 Are Chinese, Japanese, and Korean one script or three?
Considerations of Kana and Hangul as important components
of Japanese and Korean writing system may strongly suggest
three. Even the Han-derived subset may be problematic for
some purposes because a given character may be written in
different ways for the different languages. Independent of
the advantages of "Han unification" for other purposes, it
may not be appropriate for IDNs where the relevant language
cannot be conclusively known to a rendering engine.
6.2 Are Greek, Latin, and Cyrillic three scripts or one?
Most Latin characters are clearly derived from Greek
predecessors; some of the character shapes are identical.
Cyrillic was also derived from Greek (via Old Church
Slavonic). It shares even more characters with Greek than
Latin does and, at least transitively, shares several
characters with Latin (the much discussed "paypal" example
demonstrates that relationship). Unicode treats the
scripts as separate, at least in part because of the
history of national character coding systems. But for IDN
purposes, we might have been much better off had identical
characters in two (or all three) of those writing systems
not been assigned to different code points.
6.3 Are Western Arabic and Eastern Arabic the same script?
The Arabic script as used to write the Arabic language is
somewhat different from the Arabic script used to write
Persian, Urdu, and some other languages. Some
conceptually-identical characters that are visually
identical in some contexts (but not all) are assigned
separate code points and there are two separate sets of
code points for digits. The differences are arguably no
less significant than the differences among Greek, Latin,
and Cyrillic, yet those three are treated as separate
scripts while the Arabic collection are treated as a single
script with some additional, language-group-specific,
characters. As with Han script, there are some advantages
to unifying the two collections, but they may not be
optimal or IDN use.
These distinctions, or the lack of them, do not represent
problems that are easy to solve. I have noted in other
contexts that many religions claim that divine action was
involved in creating differences among languages and consequent
incomprehensibility problems among population groups. At least
to the degree to which one accepts those traditions, ICANN's
believing that it can either "solve" these problems or
successfully claim that they do not exist would be a supreme
act of hubris. But the Fast Track procedure, in treating all
of the relevant issues as examples of "visually confusing
characters", does ignore many of the issues that pretends that
a clear and globally-consistent solution to others will be
developed soon.
(7) Scripts and families of scripts -- macro-level
Independent of the grouping issues discussed above, almost all
modern scholarship about writing systems divides the writing
systems (and scripts) in use in the world today into two (or
maybe three) groups based on historical development. One group
is characterized by a very loose binding (or no binding at all,
depending on the character) between a character and the
associated pronunciation. The same character may be used in
different languages to represent similar concepts but not the
same sounds. The other group is characterized by a phonetic
interpretation of characters: while there are exceptions and
variations, characters are used to represent phonemes or
slightly larger sound units that are more or less consistent
across the languages written using those characters.
The first group has only one contemporary member (certainly
only one that is coded into Unicode) -- Han-derived characters.
The second group includes everything else (with the possible
exception of Hangul), with all other writing systems and
scripts sharing common ancestry and at least some ways of
working.
The Fast Track model ignores that distinction in favor of
assuming that a single set of rules, derived largely from
alphabetic or phonetic conditions, can apply to all writing
systems. That is a very attractive idea and would be
convenient if true... but it isn't. The difference has already
been a problem for ICANN. The Fast Track procedures should not
attempt to continue to ignore it going forward.
(8) The "variant" mess(es).
The term "variant" was introduced into the IDN discussion by
RFC 3743 to refer to two characters with identical, or
nearly-identical, "meaning" but different shapes (glyphs) and
code points within a single script. Its use in that context
illustrates one of the difference between Chinese characters
and alphabetic-phonetic ones mentioned above. If we were to
transpose "meaning" into "sound", every character representing
the same phoneme as another one in any alphabetic script would
be a variant of every other such character, independent of how
the characters were written.
At some point, parts of the ICANN community started using the
term "variant" to describe the relationship among characters
that were visually similar (or "visually confusing") and, by
extension, several other types of relationships. The Fast
Track procedure seems to assume simultaneously that all types
of related characters could be described as "variants", that
variants were associated only with visually similar characters
(an obvious contradiction), and that visual similarity was the
only likely source of user confusion. This sloppy use of
terminology and concepts created confusion that did not exist
before and that may have been wholly unnecessary.
It is also worth noting that there have always been
alternatives to trying to establish aliases or synonyms within
the DNS. They are not very attractive to some communities and
involve complex tradeoffs with other options and
considerations, but so do aliases and synonyms. For some
purposes, the latter may involve waiting until we redesign the
DNS and deploy a new version, which is a fairly drastic
constraint.
I strongly recommend that you clean this mess up in the Fast
Track procedure. In particular:
(i) Stop talking about "variants" entirely unless you are
prepared to supply one or more clear and precise
definition(s).
(ii) If you continue to let people specify "variants" as
part of their applications, require that they explain the
relationships (i.e., what sort of "variant" is involved and
why) and explicitly indicate which form they are applying
for to be immediately delegated, which form they are hoping
to have delegated eventually, and which ones they are
merely trying to have reserved forever.
(iii) Require applicants who are requesting delegation of
"variants" to explicitly indicate whether they are looking
"for
aliases/ synonyms with a single delegation tree or whether
they are looking for multiple delegation trees that they
intend to manage in some more or less linked way.
(iv) Make it very clear to all concerned that delegation of
variants only after all of the relevant technical issues
are resolved, e.g., that general-purpose aliases are
available in the DNS, is likely to be equivalent to "never"
and that short-term plans should not be made on that basis.
Let's stop trying to fool ourselves and each other. In the
long run, pretending that this is easy, or that general
solutions are right around the corner, hurts ICANN, applicants,
and the Internet in general.
Thanks for listening.
John Klensin
(speaking for myself only)
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|