<<<
Chronological Index
>>> <<<
Thread Index
>>>
[gnso-idng] Draft on String Similarity
- To: gnso-idng@xxxxxxxxx
- Subject: [gnso-idng] Draft on String Similarity
- From: Eric Brunner-Williams <ebw@xxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 11 Dec 2009 09:11:23 -0500
Councilors,
During the past weeks the participants in the gnso-idng@xxxxxxxxx
mailing list (IDNG) have discussed, on the mailing list and in
conference calls, aspects of the situation which exists following the
Board's vote at Seoul.
One area of discussion which raises a policy issue is confusingly
similar strings. Because this seems an area where the obvious right
thing has already been done we need to draw attention to two aspects
which have been overlooked.
First, the current meaning of "similar" is now broader than "visual
similarity", and appears to include "meaning".
Second, the underlying assumption in the evaluation process is that
each evaluation is independent of all other evaluations.
These, a rule (about a string in an application) and a meta-rule
(about all applications), have a consequence which we suggest is not
desirable.
In the following example we use "china" and "中国" (chung guo), simply
as a well-known example of two strings. We ignore that both are part
of the universe of strings more likely to pertain to ccTLD and their
policy body than the GNSO, for the purpose of illustrating the policy
problem.
The string "china" and "中国" (chung guo) are "similar" in meaning,
therefor they form a contention set. Under the current rules in DAGv3,
only one application who's string is a member of a contention set may
proceed towards delegation. Whether the choice is by order of
creation, or amongst contemporaries, by community evaluation and/or
auction, the result is the same. One member of an (extended, in the
sense of including existing registries) contention set thrives. All
others fail.
This is the proper and correct end, except for one case which is more
likely to exist for applications for IDN strings than for restricted
ASCII (letters, digits, hyphen) strings. That case is where two, or
more, applications for similar strings are advanced by a single
applicant, or two or more cooperating applicants.
Returning to our "china" and "中国" (chung guo) example, if XYZ Co.
applied for both "china" (application #1) and "中国" (application #2),
the current rules can not allow both strings to exist in the root,
though both are brought by the same applicant.
The fundamental rational is that confusion is harmful. This rational
is not universally correct. There are instances where confusion
results in no harm, and more importantly, where "confusion" creates
benefit.
Because "beneficial confusion" is not obvious to users of Latin
Script, an example, we offer the original example of cooperation among
"applicants" to benefit their registrants and users, through "similarity".
In 2001, the registries for China, Taiwan, Hong Kong and Macao
discussed cooperation so that mixing of Simplified Chinese, prevalent
in China, and Traditional Chinese, prevalent in Taiwan, but
interchangeable without loss of meaning, would not result in user
confusion. These "applicants" cooperated to create "beneficial
confusion", so that "similar strings" actually had similar meaning,
that is, resolved as expected by their user community.
No user "confusion" resulted from this multi-applicant cooperation,
except perhaps in Marina del Rey.
Coordination to create "beneficial confusion" may exist where one
applicant submits two or more applications, as in the "china" and "中
国" (chung guo) example, or where two or more applicants submit two or
more applications, as the four cooperating Chinese registries did,
almost a decade ago.
It is possible that applicants for two or more similar strings could,
upon failure, resort to extended evaluation, where the cause of the
failure is similarity with an existing TLD. Present registries seeking
similar IDN delegations could simply cost in the extended evaluation
cost as part of the application cost. This is inelegant, but not
fatally so.
Unfortunately, for applicants simply seeking two or more delegations
with similar meaning, independent of script, as in the "china" and "中
国" (chung guo) example, initial evaluation failure and extended
evaluation are not available. The contention set consisting of two
strings and one actual applicant go to auction, with absurd outcome
from the business perspective, and tragic outcome from the language
perspective, as one script choice eliminates all others, for some
meaning defined construction of "similarity".
We suggest the Council consider the following to cure this defect.
1. that the meta-rule that all applications are independently
evaluated be modified so that cooperating applications may, if the
contention set they form contains no non-cooperating applications,
proceed in the evaluation process.
2. that the rule that applications for strings which are "confusingly
similar" to existing registries, where the application is brought by
the existing registry to which "confusing similarity" exists, be
modified so that these applications do not fail the initial evaluation
and require extended evaluation, or some other heroic measure.
Both of these recommendations have generalizations.
The independent applications presumption overlooks the certainty that
the legacy operator and application authors of 2000 and 2003-2006, and
new authors, will each author multiple 2010 applications, some of
which have common characteristics, such as "similarity" and
"cooperation", and absent a mechanism to "signal" the similarity to
the evaluation process, adverse outcomes and process inefficiencies
are certain.
A property common to two or more applications should be discoverable
by the evaluation process, especially when the applicants desire this
common property to be known to the evaluation process.
The presumption of user confusion and harm where similarity exists
overlooks the utility of similarity, and more profoundly, substitutes
a Marina del Rey centric meaning and utility test for the meanings and
uses that exist at large.
Correct use by two or more applications, that is, a property common to
only the strings sought by two or more applications, can, and should
be discoverable by the evaluation process.
The Council need not necessarily look to the generalizations of the
modifications of rule and meta-rule we suggest to cure the anticipated
problem of similar strings in IDN applications, but should it do so,
the IDNG participants are prepared to further inform the Council.
As it stands, the IDNG participants now understand the unintended
effect of the "similarity" rule, and the "independent application"
meta-rule, to be that only one of the six UN languages may be used for
any identifier in the root, with adverse consequences for cooperation
and harmonization of operational practice and user expectations.
The IDNG participants thank the Council for its time and attention
considering the its initial work product.
This ends the draft. Chuck, Edmond, Avri, edit to your heart's content.
Eric
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|