<<<
Chronological Index
>>> <<<
Thread Index
>>>
RE: [gnso-contactinfo-pdp-wg] Wednesday 12 November 23:59 UTC soft deadline for comments
- To: Emily Taylor <emily.taylor@xxxxxxxxxxxxx>
- Subject: RE: [gnso-contactinfo-pdp-wg] Wednesday 12 November 23:59 UTC soft deadline for comments
- From: "Dillon, Chris" <c.dillon@xxxxxxxxx>
- Date: Tue, 11 Nov 2014 11:57:29 +0000
Dear Emily,
I would like to thank you on behalf of the Group for this large amount of work
both in summarizing colleagues’ comments and providing your own.
I hope both that it will make the non-mandatory arguments stronger and
stimulate more discussion of the mandatory arguments on this list and in the
meetings.
With all best wishes,
Chris.
--
Research Associate in Linguistic Computing, Centre for Digital Humanities, UCL,
Gower St, London WC1E 6BT Tel +44 20 7679 1599 (int 31599)
www.ucl.ac.uk/dis/people/chrisdillon<http://www.ucl.ac.uk/dis/people/chrisdillon>
From: Emily Taylor [mailto:emily.taylor@xxxxxxxxxxxxx]
Sent: 11 November 2014 11:20
To: Dillon, Chris
Cc: Lars Hoffmann; gnso-contactinfo-pdp-wg@xxxxxxxxx
Subject: Re: [gnso-contactinfo-pdp-wg] Wednesday 12 November 23:59 UTC soft
deadline for comments
Dear Chris
Thank you for this timely reminder. Over the past few days, I have been
gathering input from colleagues in the Registrar Stakeholder group. There was
a rich discussion on the list, with many participants. These are less comments
on the paper itself than contributions to the general discussion of the issues.
Here is a synthesis of the comments. I hope that they will be useful in
cross-checking against the "arguments opposing mandatory transformation" on
pages 11-12:
1. Costs: This proposal essentially externalises translation costs from LEA/IP
to Registrars, and none of the commentators were convinced that the costs for
contracted parties are justified by benefits to others. Those requesting the
data can pay for the translation.
2. Scale: Why translate/transliterate all WHOIS data, rather than simply those
names that are of interest on-the-fly? Status quo is several orders of
magnitude more efficient
3. Accuracy and responsibility: If the premise of WHOIS data is that it is
provided (and declared accurate) by the Registrant, then who accepts
responsibility if Registrars are required to alter that data? How would the
proposals impact whois data accuracy complaints and whois verification
requirements?
4: Data integrity: The whois should be displaying what the client entered.
Our trying to interpret that only leads to more data errors, and less accurate
data. If we change what the client enters it will only lead to errors:
a. Will there be rules on how transliterate non-ascii characters so that
it can be done programmatically? Is there some standard system to be used, or
are we all just counting on Google Translate?
b. If human judgment is required, who is responsible for doing it?
c. If the registrant is responsible, what if they do not know what it
should be?
d. What if a third-party disagrees with the accuracy of a transliteration?
e. Is the registrant’s consent required before a transliteration is
published in the whois?
f. Can a registrant withhold consent?
g. What if a registrant wants to change an “approved” transliteration?
h. Is a whois verification required every time one of these
transliterated fields are updated?
i. Where does the requirement for data transformation end? Could Chinese LEA
require a contracted party to translate/transliterate existing English contact
details into Mandarin? Or, what if the original registration was in a third
language/script (Russian Cyrillic), would that skip English and go directly to
Chinese?
5. Compliance: "who will and how will this be policed?” If ICANN are making
cutbacks in their budget, how are they going to afford the human resources to
check every Whois transliteration is correct? It doesn’t make much operational
sense, and will likely end up with the registrant paying higher fees for
something that they never asked for.
6. Internationalisation: The concept starts to erode the “my language, my
Internet” / IDN principle of ICANN, by compelling the use of
English/Latin/ASCII by people and locations not using those language/script
combinations. One commentator put it as "Sadly, it is North American thinking
I suspect. 'We must translate everything into English'.
7: Competition: If a contracted party does not want to support a language that
should be their prerogative. They can turn away business if they decide that
they won’t be able to service that customer appropriately.
---------------
General comments
Taking into account the above input, I have the following observations to make
on the draft paper.
First, thank you Chris and the ICANN team for your work in the unenviable task
of fairly summarising the arguments on both sides. I appreciate that it is an
important step in the process to try and understand the arguments on both sides.
A general point: I have no sense from the paper, or from the discussions in the
group, of the scale of the problem we are addressing here. Do we have any
stats for the following:
(1) a breakdown of WHOIS data by country of registrant - and can we infer what
language WHOIS data is likely to be in? The nearest I can get to is this map
from OII which shows the predominance of Latin script / English language
countries in the current domain market
(http://geography.oii.ox.ac.uk/?page=geography-of-top-level-domain-names) .
However, if you look at growth potential, clearly that is not the case. And
IDN registrations by country show a different pattern (see page 17 at
http://www.eurid.eu/files/publ/IDNWorldReport2014_Interactive.pdf)
(2) an estimate of what is likely to be the language of WHOIS data if multiple
languages were enabled in these fields. For example, we could perhaps draw
some inferences from the IDN registrations in ASCII TLDs. Approximately 1% of
.com and .net registrations are IDNs, and the majority of those are Latin
script. This may not be representative in that the Latin script ending for
.com is more likely to be attractive to Latin script IDNs than, say, right to
left scripts or pictograms. There are currently just shy of 900,000 Russian
ccTLD IDNs. Of these over 800,000 has a registrant based in Russia, and uptake
in other countries is low (even former Soviet Union). See
http://statdom.ru/tld/%D1%80%D1%84/report/summary/. There are approximately
12,000 IDNs in Arabic script ccTLDs. Uptake of IDN new gTLDs has been fairly
limited. I don't think that anyone is claiming that the IDN market has even
nearly fulfilled its market potential, but can we have some statement of the
scale of the problem?
(3) Do we have a sense of how many WHOIS look-ups are performed by law
enforcement and IP interests, what percentage that represents of all WHOIS look
ups, and how many prove to be problematic in terms of language of contact? On
the other hand, what problems are currently created by not having the ability
to record contact details in the script of the domain name (eg for IDNs)?
(4) There have been a number of studies on different aspects of WHOIS data in
the last couple of years - do any of these help to guide us?
Specific comments
Page 11 - as you say there is disagreement on "ease" of search. If you're
English mother tongue, then it might be "easier" to understand the output of a
search, but any string is searchable, and you can interpret the search results
whatever their script/language.
I find the first bullet point unconvincing - it's like saying "why doesn't
everyone just learn English? It's such a mess having all these languages"
On the second bullet point, p11 - I appreciate that a counter argument is
stated to the "transformation will to some extent facilitate communication"
argument. The communication argument is a difficult one. On one level - as
demonstrated within this working group and many others - we default to English
in order to communicate with one another across different languages. However,
this is also (to some extent) a factor that deters input from those who are not
confident in English as a second language - who may be able to give valuable
insights into the debate. I believe that this is captured in "to some extent"
but would welcome more acknowledgement that this cuts both ways.
The third bullet point does not explain why it is also necessary to
transliterate/translate *all* data for this benefit to be felt. We need some
consideration of proportionality here.
Fourth bullet - define "least translatable" - for whom? Is this truly posed as
a barrier to law enforcement and others?
To balance the "cyberflight" argument in the fourth bullet point, could we also
point out that in general people tend to register and host locally. This is
perhaps a surprising phenomenon given the strength of some registrars
internationally. For example, on page 5 at
http://www.eurid.eu/files/publ/IDNWorldReport2014_Interactive.pdf) we have an
analysis of country of hosting for gTLD IDNs plus .eu IDNs. This was done
based on the IP ranges associated with the domain names. You can see that
countries and regions with strong international registrars (eg North America,
UK) don't really show any "winner" script. In contrast, Chinese script,
Cyrillic, Han (plus Katakana, Hiragana), Thai, Hangul, Arabic script domains
tend to be hosted in countries where associated languages are spoken.
Could I also add that you can see within large IDN namespaces which offer
multiple scripts (eg .com and .net) that registrations cluster strongly around
popular scripts. There are very small numbers indeed outside of them. I can
produce some more analysis on that point if people like.
I hope these inputs are helpful to the working group in its deliberations, and
I look forward to joining the discussions.
Best wishes,
Emily
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|