<<<
Chronological Index
>>> <<<
Thread Index
>>>
[jig] Discussion of the VIP integrated issues report
- To: "jig@xxxxxxxxx" <jig@xxxxxxxxx>
- Subject: [jig] Discussion of the VIP integrated issues report
- From: "Dillon, Chris" <c.dillon@xxxxxxxxx>
- Date: Fri, 13 Jan 2012 09:45:22 +0000
Dear colleagues,
During Tuesday's conference call, Edmon asked me to have a look at the VIP
integrated report, especially section 4, with the aim of encouraging discussion
of it on this list which could form the basis for JIG comments on the report. I
have summarised parts of two sections (4. and 7.) and added some comments and
suggestions (both indicated with **).
4.1- Label generation rules
In this case this term applies to labels which may be allowed in the root zone.
They would identify variant characters and rules for their use in labels. Once
variant labels are identified, actions could be taken to put them into states
such as 'active use' and 'prevention'. Work is required in this area to move
from the parameters ([Unicode] comprehensiveness, expertise and [code point
property] qualification) of the rules in the report to detailed proposals for
their creation.
The report gets as far as sketching out five options for the proposals:
1: Complete generation for every script table to be used, by an expert panel
The scripts are selected in advance by ICANN, and ICANN assembles the relevant
panels to develop the rules for the script.
It is not clear what would happen "if a script expert panel were unable to come
to consensus on the necessary rules for every code point in that script".
** This approach has the disadvantage that time would be wasted on code points
that would never be used.
2; Assemble an expert panel for scripts likely to be desired, and include code
points on a "best efforts" basis
The scripts are again selected in advance by ICANN, and ICANN assembles the
relevant panels to develop the rules for the script.
The zone repertoire for the root cannot be determined in advance, and can be
derived only after all the expert panels have reported.
** This sounds as if it could be very slow.
3: Create policies for script-relative lists of code points
Zone repertoires may be built to extend the Unicode Script Tables.
A small number of code points would be allowed across scripts.
It is not clear what might be done if two panels were to set conflicting
representation variant rules for the same code point.
4: Evaluate community proposals for label generation rules
The expert panel would merely review submitted proposals from the community
instead of developing the representation label rules itself.
** See my comments below (after 5.).
5:Build up zone repertoires ad hoc
Instead of ICANN selecting an expert group, representation repertoires and
associated variant rules could be created by interested parties.
In the event of a conflicting rule, the tendency would be for the variant label
not to be activated.
There may be instability in the label generation policy.
"Identification of an appropriate authority for the code point repertoire for
the root zone is a difficult undertaking. To the maximum extent possible, the
relevant language communities need to agree on a shared set of code points for
the zone repertoire."
** ICANN therefore needs to approach them, but how? One single committee would
be huge and slow. Perhaps several committees, along the lines of the VIP, but
potentially covering all languages, would be a better approach. Where should
the committees sit? NICs? IETF? IANA? ICANN? Let each community decide as long
as it defines only one table per language?
7.1 Developing a Label Generation Ruleset specification
"Based on the analysis, a general requirement for all approaches considered is
the need to use a tool to machine-generate sets of variants in accordance with
formal label generation rules."
** It is important to emphasize the large effort necessary to create tables
that could be read by such a tool, but it is not impossible (see the Chinese
example below).
"ICANN currently manages a voluntary repository of "IDN Tables," of which some
contain instructions on computing variants. While some language communities
have formalized the formatting of their tables, there is no single established
format that can accommodate the various rulesets in existence today.
Recognizing that deployable solutions will require such tables, it is clear
that the effort would benefit from the standardization of a table format that
would allow software implementers to easily and predictably generate variants."
"ICANN could facilitate a reference implementation of software"
** This certainly sounds like a quick way of making progress.
** The approach in the IANA IDN tables e.g. .ASIA Chinese (
www.iana.org/domains/idn-tables ) looks practical. Could this be built on?
That table defines which code points are allowed for Chinese and variant
characters for Traditional Chinese and Simplified Chinese (many characters are
the same in both). By omission it also defines which code points are blocked.
However, these are language tables, rather than script tables. Japanese, for
example, is in a separate code table and moreover there are several code tables
for each language, depending on the registry. In this case would a Han script
table be effectively the sum of the Chinese and Japanese tables? (If my
understanding is correct, neither Korean nor Vietnamese are intending to have
Han script TLDs.)
** Would this approach be scalable to other scripts, for example, Arabic (for
Arabic, Farsi, Urdu etc.) and Cyrillic (for Bulgarian, Russian, Serbian,
Ukrainian etc.)?
Incidentally, I have turned the comment I made during this week's phone call
about scripts not covered by the case studies into a more substantial comment
that I am intending to post directly to
www.icann.org/en/announcements/announcement-2-23dec11-en.htm .
Chris.
==
Research Associate in Linguistic Computing
Department of Information Studies
University College London, Foster Court
Gower Street, London WC1E 6BT
Tel +44 20 7679 1599 (inside UCL: 31599)
www.ucl.ac.uk/dis/people/chrisdillon
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|