ICANN ICANN Email List Archives

[jig]


<<< Chronological Index >>>    <<< Thread Index >>>

[jig] Discussion of the VIP integrated issues report

  • To: "jig@xxxxxxxxx" <jig@xxxxxxxxx>
  • Subject: [jig] Discussion of the VIP integrated issues report
  • From: "Dillon, Chris" <c.dillon@xxxxxxxxx>
  • Date: Fri, 13 Jan 2012 09:45:22 +0000

Dear colleagues,

During Tuesday's conference call, Edmon asked me to have a look at the VIP 
integrated report, especially section 4, with the aim of encouraging discussion 
of it on this list which could form the basis for JIG comments on the report. I 
have summarised parts of two sections (4. and 7.) and added some comments and 
suggestions (both indicated with **).

4.1- Label generation rules
In this case this term applies to labels which may be allowed in the root zone. 
They would identify variant characters and rules for their use in labels. Once 
variant labels are identified, actions could be taken to put them into states 
such as 'active use' and 'prevention'. Work is required in this area to move 
from the parameters ([Unicode] comprehensiveness, expertise and [code point 
property] qualification) of the rules in the report to detailed proposals for 
their creation.

The report gets as far as sketching out five options for the proposals:
1: Complete generation for every script table to be used, by an expert panel
The scripts are selected in advance by ICANN, and ICANN assembles the relevant 
panels to develop the rules for the script.
It is not clear what would happen "if a script expert panel were unable to come 
to consensus on the necessary rules for every code point in that script".
** This approach has the disadvantage that time would be wasted on code points 
that would never be used.
2; Assemble an expert panel for scripts likely to be desired, and include code 
points on a "best efforts" basis
The scripts are again selected in advance by ICANN, and ICANN assembles the 
relevant panels to develop the rules for the script.
The zone repertoire for the root cannot be determined in advance, and can be 
derived only after all the expert panels have reported.
** This sounds as if it could be very slow.
3: Create policies for script-relative lists of code points
Zone repertoires may be built to extend the Unicode Script Tables.
A small number of code points would be allowed across scripts.
It is not clear what might be done if two panels were to set conflicting 
representation variant rules for the same code point.
4: Evaluate community proposals for label generation rules
The expert panel would merely review submitted proposals from the community 
instead of developing the representation label rules itself.
** See my comments below (after 5.).
5:Build up zone repertoires ad hoc
Instead of ICANN selecting an expert group, representation repertoires and 
associated variant rules could be created by interested parties.
In the event of a conflicting rule, the tendency would be for the variant label 
not to be activated.
There may be instability in the label generation policy.

"Identification of an appropriate authority for the code point repertoire for 
the root zone is a difficult undertaking. To the maximum extent possible, the 
relevant language communities need to agree on a shared set of code points for 
the zone repertoire."
** ICANN therefore needs to approach them, but how? One single committee would 
be huge and slow. Perhaps several committees, along the lines of the VIP, but 
potentially covering all languages, would be a better approach. Where should 
the committees sit? NICs? IETF? IANA? ICANN? Let each community decide as long 
as it defines only one table per language?

7.1 Developing a Label Generation Ruleset specification
"Based on the analysis, a general requirement for all approaches considered is 
the need to use a tool to machine-generate sets of variants in accordance with 
formal label generation rules."
** It is important to emphasize the large effort necessary to create tables 
that could be read by such a tool, but it is not impossible (see the Chinese 
example below).

"ICANN currently manages a voluntary repository of "IDN Tables," of which some 
contain instructions on computing variants. While some language communities 
have formalized the formatting of their tables, there is no single established 
format that can accommodate the various rulesets in existence today.
Recognizing that deployable solutions will require such tables, it is clear 
that the effort would benefit from the standardization of a table format that 
would allow software implementers to easily and predictably generate variants."
"ICANN could facilitate a reference implementation of software"
** This certainly sounds like a quick way of making progress.

** The approach in the IANA IDN tables e.g. .ASIA Chinese ( 
www.iana.org/domains/idn-tables  ) looks practical. Could this be built on? 
That table defines which code points are allowed for Chinese and variant 
characters for Traditional Chinese and Simplified Chinese (many characters are 
the same in both). By omission it also defines which code points are blocked. 
However, these are language tables, rather than script tables. Japanese, for 
example, is in a separate code table and moreover there are several code tables 
for each language, depending on the registry. In this case would a Han script 
table be effectively the sum of the Chinese and Japanese tables? (If my 
understanding is correct, neither Korean nor Vietnamese are intending to have 
Han script TLDs.)
** Would this approach be scalable to other scripts, for example, Arabic (for 
Arabic, Farsi, Urdu etc.) and Cyrillic (for Bulgarian, Russian, Serbian, 
Ukrainian etc.)?

Incidentally, I have turned the comment I made during this week's phone call 
about scripts not covered by the case studies into a more substantial comment 
that I am intending to post directly to 
www.icann.org/en/announcements/announcement-2-23dec11-en.htm .

Chris.
==
Research Associate in Linguistic Computing
Department of Information Studies
University College London, Foster Court
Gower Street, London WC1E 6BT
Tel +44 20 7679 1599 (inside UCL: 31599) 
www.ucl.ac.uk/dis/people/chrisdillon






<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy