ICANN ICANN Email List Archives


<<< Chronological Index >>>    <<< Thread Index >>>

[idn-discuss] Character tables for language sets

  • To: idn-discuss@xxxxxxxxx
  • Subject: [idn-discuss] Character tables for language sets
  • From: Cary Karp <ck@xxxxxxxxxx>
  • Date: Tue, 21 Sep 2004 12:06:26 +0200 (CEST)

A generally implementable statement about the IDN requirements for a given
language can only be made by an agency that has detailed familiarity with
that language's orthographic detail, with the IDNA protocols, and would be
recognized without question as an authoritative source of information about
the way the two interrelate. Agencies responsible for both ccTLDs and gTLDs
are equally likely to be able to make knowledgeable statements about the
requirements of the languages in which they conduct their daily business.
There is, however, a greater likelihood that a gTLD would need to
accommodate registration in languages that are distant both geographically
and linguistically from its base of operation. Obtaining requisite
assistance with such languages will rarely be a matter of insurmountable
difficulty. If, however, the results were to be placed in the IANA Registry
for IDN Character Tables, their credibility would more likely be questioned
than would be the case with equivalent tables contributed by ccTLDs with
obvious association with those languages.

The IANA registry has clear utility as a platform for sharing the language
expertise possessed by all of the TLD registries, both supporting the
development of individual registry statements and providing a credible
means for their publication. Its potential in this regard is, however,
developing at what may not be an adequate rate. In absence of ideally
suitable reference material in the central repository, gTLDs are crafting
their own character tables as the needs of their target communities
require. Even if all such tables are equally sound, they may differ in
point of detail, thus increasing the risk of general confusion. It is
perhaps in recognition of this that some tables in actual use have not been
placed in the registry.

However realistic it may be to expect the ccTLDs in countries that share 
major language concerns collaboratively to draft unified character tables, 
any such action would obviate the need for some separate gTLD action. In 
any case, a ccTLD normally has clearer language nexus and can thus more 
readily make generally authoritative statements about such things as IDN 
character tables.

Anything that can be done to elicit more extensive ccTLD involvement in the 
development of the IANA registry would therefore be likely to hasten the 
development of IDN. One factor that may be braking this, and which is also 
directly relevant to gTLD participation, is the apparent lack of provision 
in the registry for tables based on language groups. The ICANN Guidelines 
for the Implementation of Internationalized Domain Names allows a TLD 
registry to "associate each registered internationalized domain name with 
one language or set of languages."

Since languages outnumber countries by about thirty to one, national 
domains will commonly need to accommodate more than one language. The 
lexical base for many languages also includes a substantial number of words 
borrowed from other languages and which are represented using characters 
that are external to the repertoire native to the receiving language. The 
degree of such overlap in languages used by adjacent communities can be 
significant, and many potential sources of confusion in their respective 
IDN support can be revealed -- and thus more easily avoided -- by including 
them in a single aggregated character table.

This highlights the need to take extreme care in recognizing the difference 
between language and script. Strict focus on the first of these factors is 
fundamental to the policies relating to certain language groups. In other 
situations, a shared script may be both a natural and satisfactory basis 
for establishing a homogeneous language set. The alternative of maintaining 
separate tables for each language belonging to such a group, and indicating 
the associations between them via some external device (another table?), 
can conceal rather than reveal details that require particular note. 

Danish, Norwegian and Swedish provide one example of languages that might
beneficially be presented in an aggregated table. All three are extremely
closely related and use almost identical alphabets. The differences between
them are of potential relevance to IDN policies and would readily be
revealed in a shared table. This reasoning could purposefully be extended
to a shared table, for example, for all of the languages used in the EU
member states that are represented using the Latin Unicode tables.

Perhaps someone more familiar with the policies underlying the IANA 
registry could comment on its extensibility as sketched above, explaining 
either why this is not a path along which we can proceed, or how we might 
pick up the pace at which we can move onward.

<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy