ICANN ICANN Email List Archives

[gnso-idn-wg]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [gnso-idn-wg] One string per application

  • To: "Tan, William" <William.Tan@xxxxxxxxxxx>, Bruce Tonkin <Bruce.Tonkin@xxxxxxxxxxxxxxxxxx>
  • Subject: Re: [gnso-idn-wg] One string per application
  • From: Tan Tin Wee <tinwee@xxxxxxxxxxxxxx>
  • Date: Tue, 20 Feb 2007 20:57:56 +0800

RE: One string per application

This is one example of why it is absolutely crucial to involve language/script
groups in the deployment of IDN gTLDs. Why?

For Chinese language operators of the Han Unified character set in Unicode,
they have sat around the table to discuss how to square off traditional
and simplified characters and to avoid conflicts with Japanese Kanji
and Korean Hanja who use the same Han Unified character set in Unicode.
The JET have discussed this many years ago and documented their
findings. Mainland China has already established their rules for
deployment of .com, .china, .org in Chinese Han characters since 2003.

The way Chinese Simplified character folks in Mainland China and Singapore,
Chinese Traditional character folks in Taiwan, and Japanese Kanji users
and Korean Hanja users deal with Han character variants, equivalents and
whatever you want to call them, does not simply equate with how the
Arabic script users square off with Persians script users in Iran, or
how the Urdu folks in Pakistan and India use Arabic-based scripts.

So to invent policy on IDN rollout to fit the Han characters and to fit the
Arabic-based scripts or any other non-ASCII IDN strings, and
everything else, by folks like us sitting in front of an ASCII keyboard
or meeting in Lisbon, and expecting it to be legitimately acceptable by
those who would be using the IDNs or by those who would be providing
the services, is simply asking for trouble, not to mention being seen
as naive.

Single one-size-fits-all IDN policies will incur the ire of multiple
groups of people and may cause outrage and international furore if
we are not careful. And we can't hide from the excuse often cited
"we cannot please them all, so let's just adopt our policy
and see how that goes". To be sure, it will not go down well.

I wish to apologise in advance if this sounds a bit harsh, but
comments like "I donÂÂt see that a registry operator should be
compelled to apply for, or be granted, every possible typographic
variation of their chosen string." do not help, but cause more
problems. It is almost equivalent (I stress almost and not
exactly for technical reasons) to saying that
one registry can register .seattle,
but another competing registry can register .Seattle and
yet another could try for .SEATTLE for English. For some,
it is even more sensitive than this simple example.

One solution to reconcile this with not compelling
all registries to apply for and be granted every possible
"typographic variation of their chosen string" would be
for such variations to be reserved. In other words, no
competing registry should be trying to pass off as a
.ÃÃÃÃËD as .ÃÃÃÃÃÂ, or using a more dramatic but
hypothetical example, if PR China
gets .zhongguo(CHINA) in simplified characters, there may be
an international diplomatic situation if Taiwan
were to subsequently get .zhongguo in traditional
characters.

And since Iran is so topical, in the case of the example .IRAN in
Farsi (Parsi or Persian), their TLD deployment has to
cover both Arabic and Persian lookalikes.
[xn--mgba3a4fra ] [xn--mgba3a4f16], so the IRNIC colleagues
at the Institute for Studies in Theoretical Physics &
Mathematics (IPM) say.

So it is equally worth saying:

It is worth combining the issue of being granted a particular TLD string, and
the issue of strings that may be typographically similar *because the
are one and the same thing in the case of .zhongguo or .iran.*

And one would be equally comfortable saying:

In the example quoted by William Tan, a registry operator could apply for one
or both strings, and both will be equivalent, as equivalent as
.net is the same as .NET or .NeT etc etc etc in the case of ASCII
uppercase-lowercase equivalence.

If the operator were granted both strings then the
operator would not need to decide whether example.ÃÃÃÃËD
and example.ÃÃÃÃÃÂ would map to the same nameserver or not -
they would be the same.

And I agree with Bruce Tonkin that it would be worth saying too, that:

If the registry operator applied for only one string and was granted the string,
then it would be impossible for another registry operator to gain approval for
the other string given its typographical similarity/linguistic and
semantic equivalence, and in fact, the registry operator should automatically
get the other equivalent strings or combinations.

So given all the diversity of languages and scripts, how is it
possible for a single committee meeting in Lisbon, or exchanging
emails in English, determine a basic set of
universal principles without consultation with the users/experts
or gaining an appreciation of the languages and scripts
that the policy is planning to regulate successfully.

If something that doesn't make sense in a language/script is
going to be rolled out, like the current Unicode for Tamil which
none of the Tamil Speaking software developers could
really use or implement, they will just go look somewhere else.

Since 1998 when I implemented IDNs out of Singapore,
to the long drawn-out record-breaking IETF standards
process ending in 2003 co-chaired by my former student
James Seng, to the policy making committees
from 2003 to date, the tardiness of the Internet process,
exacerbated by the expectation of Internet speed in the global
community, has already led to the Internet fragmentation
which has already reported and well documented in
ITU/UNESCO/MINC meetings
http://www.ngp.org.sg/events/2nd_SEAGF/SEAGF-TanTinWee.pdf
http://www.itu.int/ITU-T/worksem/multilingual/index.html
and reported in Newsweek International.
http://www.msnbc.msn.com/id/12666393/site/newsweek/

Imagine what would happen if we try to recommend and
impose IDN deployment policies on communities that will
question the legitimacy or even doubt the competence of
this process that did not involve their language users,
script experts or authorities
or worse, a process that makes it a long drawn procedure to
have their representatives join the policy making process?

bestrgds
tin wee

Tan, William wrote:
Bruce Tonkin wrote:
It is worth separating the issue of being granted a particular TLD
string, and the issue of strings that maybe be typographically similar.
I agree that they are separate issues.

In the example you quote above, a registry operator could apply for one or both strings. If the operator was granted both strings then the operator would need to decide whether example.èéå and example.èéå would map to the same nameserver.
Right, this is an aliasing issue.

If the registry operator applied for only one string and was granted the string, then it would be difficult for another registry operator to gain approval for the other string given its typographical similarity.
Agreed.

So I don't think the reigstry operator "needs" to apply for both, but there may be a commercial advantage in doing so with two applications.
I'm not suggesting a MUST in the policy. Rather, I'm questioning whether
we should restrict to "ONE string per application" as this is a case
where MORE THAN ONE string is a legitimate use case. So, this is the
part that is relevant to the "Introduction of New gTLD" issue that we
discussed last week.

Note this issue is not specific to IDNs - e.g consider in ASCII com, c0m etc. I donât see that a registry operator should be compelled to apply for, or be granted, every possible typographic variation of their chosen string.
Please see my reply to Mike. I would argue that in ASCII, the DNS has
"moved along" with regards to "1" to "L" and "0" and "o" confusability.
Even in IDN in the context of Latin languages, it is generally accepted
that users should learn how to distinguish characters with accent from
the plain letters.

The Han case is fairly unique in the sense that the CJK communities has
recommended that certain characters be mapped together due to their
semantic similarities. The predominant variants are between the
Simplified Chinese (SC) and Traditional Chinese (TC) characters. SC
characters were modified from the traditional form and are used in
mostly the same way by people in different countries. However, they have
different codepoints in Unicode/ISO-10646.

While a registry operator do not *have* to apply for variants, if
they're applying for a Han string that has variants chances are they
would want to apply for them. In fact, I would go as far as arguing for
a recommendation to apply for them in order to avoid user confusion.
This is, again, a narrow use case for the Han script that one can find
these policies mirrored in existing registries that offer Chinese IDNs
on the second level (as compared to registries that offer IDNs in Latin
scripts.)

Cheers,
=wil



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy