ICANN ICANN Email List Archives

[ssac-gnso-irdwg]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [ssac-gnso-irdwg] Three more questions on transliteration

  • To: Steve Sheng <steve.sheng@xxxxxxxxx>, Andrei Kolesnikov <andrei@xxxxxxxx>, Ird <ssac-gnso-irdwg@xxxxxxxxx>
  • Subject: Re: [ssac-gnso-irdwg] Three more questions on transliteration
  • From: Dave Piscitello <dave.piscitello@xxxxxxxxx>
  • Date: Thu, 6 May 2010 08:29:52 -0700

I found Google's AJAX Language API interesting, see
http://code.google.com/apis/ajaxlanguage/documentation/

Also, Microsoft has TU (Translation Utility) described as a "tool for
transliterating one natural language script to another (like Serbian Latin
to Serbian Cyrillic or Latin to Inuktitut)."
See
http://www.downv.com/Windows/install-Transliteration-Utility-10240210.htm

There are several java applets and windows applications for transliterations
at http://www.downv.com/Windows-software-download/Transliteration

- jtranslit
- Xlit
- Kemet API

This is a partial list.  I am curious to understand whether the state of art
in automated transliteration is adequate for the task of translating
(primarily) contact information or will be in the foreseeable future. I also
wonder whether the core functionality of any of these programs could be
separated out into an API that would integrate with backend systems.

Lastly, and not to trivialize transliteration, but this essentially seems to
be a problem for which a rules engine is naturally suited to solve? I'll ask
a colleague who is CEO of such a company.


On 5/6/10 9:47 AM  May 6, 2010, "Steve Sheng" <steve.sheng@xxxxxxxxx> wrote:

> Dear all,
>
>   I have three questions along the line of transliteration.
>
>  1. Are you aware of any programming libraries that can transliterate most
> scripts into ASCII?
>
>  2. If so, can these libraries and APIs be integrated in the registrar’s
> backend easily?
>
>  3. Would the transliterated results (along with the local script) be of
> satisfactory use for registrants and other users of WHOIS?
>
>  If all of the answers are yes, then this will revive option 3.
>
>
> Warmly,
> Steve
>
>
> On 5/1/10 6:56 AM, "Andrei Kolesnikov" <andrei@xxxxxxxx> wrote:
>
>>
>>
>> Dave, don't get me wrong. I'm not saying that you
>> screw things, but the translation does :)
>>
>> As Steven mentioned - "fabrika shokolada" is exactly transliteration
>> of Russia's фабрика шоколада. If the address is
>> "Фабрика Шоколада имени Ленина" the whois ascii output on port
>> 43 should be "Fabrika Shokolada imeni Leinina" but not
>> "Lenin's Chocolate Fabric".
>>
>> The translation may work only for historical / well known
>> places, like Red Square, Gagarin Street, Mr.Putin, etc. But
>> if one needs to send a letter to my home, the transliteration
>> address must be used: Moscow, Grayvoronovskaya st., 8-1-80.
>>
>> However, Russia maybe not a good example, because
>> a lot of English being used historically. And it's only
>> ASCII for IDNs. We do not plan yet other option.
>>
>> Some samples:
>>
>> Домен blacksabbath.ru
>> domain: BLACKSABBATH.RU
>> nserver: parking1.nic.ru.
>> nserver: parking2.nic.ru.
>> state: REGISTERED, DELEGATED, VERIFIED
>> person: Yanus P Nevstruev
>> phone: +7 495 2502030
>> phone: +7 903 6663322
>> fax-no: +7 495 2518104
>> e-mail: domain.for.sale.icq.335520.prices.at@xxxxxxxxxxxxx
>> e-mail: domen.prodayotsya.icq.335520.ceny.na@xxxxxxxxxxxxx
>> e-mail: holding.co.united.sites.of.web-adresa@nevstruev.r
>> e-mail: holding.soedinyonnye.sayty.web-adresa@xxxxxxxxxxxx
>> e-mail: nevstruev@xxxxxxx
>> registrar: RUCENTER-REG-RIPN
>> created: 2006.09.05
>> paid-till: 2010.09.05
>> source: TCI
>> Last updated on 2010.05.01 14:43:42 MSK/MSD
>>
>> Домен ленин.su
>> domain: XN--E1AGHJB.SU
>> nserver: expirepages-kiae-1.nic.ru.
>> nserver: expirepages-kiae-2.nic.ru.
>> state: REGISTERED, DELEGATED
>> person: ITolpekin
>> phone: +79099109901
>> e-mail: Apollo123@xxxxxxx
>> registrar: RUCENTER-REG-FID
>> created: 2008.04.28
>> paid-till: 2010.04.28
>> free-date: 2010.05.31
>> source: TCI
>> Last updated on 2010.05.01 14:43:42 MSK/MSD
>>
>> Домен нтв.рф
>> domain: XN--B1AVP.XN--P1AI
>> state: REGISTERED, NOT DELEGATED, VERIFIED
>> org: JSC NTV Television Company
>> phone: +7 495 6151314
>> phone: +7 495 6178839
>> fax-no: +7 495 6177785
>> fax-no: +7 495 6178839
>> e-mail: adm-group@xxxxxx
>> e-mail: aklass@xxxxxx
>> registrar: RUCENTER-REG-RF
>> created: 2009.11.26
>> paid-till: 2010.11.26
>> source: TCI
>> Last updated on 2010.05.01 14:53:42 MSK/MSD
>>
>> --andrei
>>
>>
>>>> -----Original Message-----
>>>> From: Metalitz, Steven [mailto:met@xxxxxxx]
>>>> Sent: Friday, April 30, 2010 5:05 PM
>>>> To: Dave Piscitello; Andrei Kolesnikov; Avri Doria; Ird
>>>> Subject: RE: [ssac-gnso-irdwg] Follow up from yesterday's call (what
>>>> would registration record look like
>>>>
>>>> Dave,
>>>>
>>>> When I refer to "transliteration" I am speaking of how the sounds
>>>> written in one script would be rendered in another script, rather than
>>>> a semantic translation of the meaning of those sounds.
>>>>
>>>> Thus with respect to "registrant organization," your translation as
>>>> "Chocolate Factory" may be accurate, but the transliteration would be
>>>> something like "Fabrika Shokolada."  (Apologies to all those who
>>>> actually speak Russian and could transliterate this more accurately
>>>> than I can!)
>>>>
>>>> Particularly with respect to data elements such as registrant name and
>>>> address, transliteration is far more effective than translation in
>>>> enabling contact with the registrant.
>>>>
>>>> Steve Metalitz
>>>>
>>>> -----Original Message-----
>>>> From: owner-ssac-gnso-irdwg@xxxxxxxxx [mailto:owner-ssac-gnso-
>>>> irdwg@xxxxxxxxx] On Behalf Of Dave Piscitello
>>>> Sent: Friday, April 30, 2010 8:07 AM
>>>> To: Andrei Kolesnikov; Avri Doria; Ird
>>>> Subject: Re: [ssac-gnso-irdwg] Follow up from yesterday's call (what
>>>> would registration record look like
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 4/30/10 4:37 AM  Apr 30, 2010, "Andrei Kolesnikov" <andrei@xxxxxxxx>
>>>> wrote:
>>>>
>>>>>> Dave, you've ruined the translation/transliteration 100%.
>>>>>> The address/domain name - all killed in your
>>>>
>>>> Well, if I'm going to fail, I may as well fail spectacularly.
>>>>
>>>> While I failed in one respect, I think I demonstrated what may be a
>>>> commonly encountered misunderstanding. I don't speak or read Russian.
>>>> If I were to go to Russia, I might bring a pocket translator. If I see
>>>> a sign with what appears to be a word майна, I'll type in the Cyrillic
>>>> characters to "translate" the word. The device is called a translator,
>>>> so why should I think it does anything but translate? More importantly,
>>>> what is the difference between what the device "translates" and what
>>>> you would call "transliterate"?
>>>>
>>>> Similarly, today when I search and find a web page in Japanese,
>>>> Chinese, Arabic, Cyrillic..., my search results offer me the
>>>> opportunity to "translate" the page. If I were to search and get a
>>>> result that has a WHOIS record and allow my browser to "translate" the
>>>> page, will the result be different from the "transliterated" response
>>>> that a WHOIS/43 application would display?
>>>>
>>>> Do these questions have implications on our work? Perhaps they simply
>>>> prove how utterly ignorant of the world of character sets and languages
>>>> I am? :-O
>>>>
>>>>>> Lets stay on the position, that address should be transliterated or
>>>>>> left untouched in local script.
>>>>
>>>> Agree, but please give those of us who struggle to understand the
>>>> differences between transliteration and translation an example of what
>>>> any of the data I mangled would look like transliterated.
>>>>
>>>>>> But - we've got your idea about no need to duplicate the same data
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> --andrei
>>>>>>
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: owner-ssac-gnso-irdwg@xxxxxxxxx [mailto:owner-ssac-gnso-
>>>>>>>> irdwg@xxxxxxxxx] On Behalf Of Avri Doria
>>>>>>>> Sent: Tuesday, April 27, 2010 7:34 PM
>>>>>>>> To: Ird
>>>>>>>> Subject: Re: [ssac-gnso-irdwg] Follow up from yesterday's call (what
>>>>>>>> would registration record look like
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Apologies for missing yesterday's call (and not even sending in
>>>> prior
>>>>>>>> apologies)  I had a work call at the time.  But I should have
>>>>>>>> remembered to send apologies.
>>>>>>>>
>>>>>>>> This may have been discussed, but I think it might make sense to
>>>>>>>> translate the Registrar provided information. For the most part
>>>> these
>>>>>>>> will be canned translation of setting data so it should not involve
>>>>>>>> significant marginal cost/effort to do it.
>>>>>>>>
>>>>>>>> Of course translating the names of the nameserver and email only
>>>> work
>>>>>>>> if those are indeed existing domain names and email addresses.
>>>>>>>>
>>>>>>>>
>>>>>>>> a.
>>>>>>>>
>>>>>>>>
>>>>>>>> On 27 Apr 2010, at 08:02, Dave Piscitello wrote:
>>>>>>>>
>>>>>>>>>> I thought it might be helpful to provide an example of what a WHOIS
>>>>>>>> reply might look like if a consensus were reached by the committee
>>>> to
>>>>>>>> recommend that
>>>>>>>>>>
>>>>>>>>>>       both an ASCII-7 and UTF-8 (IRD) version of a registration
>>>>>>>> record be returned
>>>>>>>>>>
>>>>>>>>>> *and* that, by convention,
>>>>>>>>>>
>>>>>>>>>>       all data elements would be of a format
>>>>>>>> <identifier><separator><value><CRLF>
>>>>>>>>>>
>>>>>>>>>> Set aside the issue of "who" submits the ASCII7 and UTF-8 versions
>>>>>>>> for the moment. I think that is a separate issue with other impacts.
>>>>>>>>>>
>>>>>>>>>> Note that the following examples are presented to allow everyone on
>>>>>>>> the call to see what I described verbally, and to allow folks not on
>>>>>>>> the call to see as well. There are other formats the committee might
>>>>>>>> consider superior to this and members are encouraged to comment.
>>>>>>>>>>
>>>>>>>>>> The version I create here duplicates all registration information.
>>>>>>>>>> As
>>>>>>>> we have discussed on the list, there may be no need to duplicate the
>>>>>>>> fields that the registrar provides (i.e., the committee could
>>>>>>>> recommend that these would be represented in ASCII7 only).
>>>>>>>>>>
>>>>>>>>>> I used Russian because I know many of the characters (I wrote a
>>>>>>>> keyboard translator for a minicomputer in the lat 1970s:-) I'm using
>>>>>>>> crude translation from babelfish.yahoo.com and a fictitious IDN TLD
>>>>>>>> so there is a high probability that I've botched the
>>>>>>>> translation/transliteration - but I hope this conveys the idea.
>>>>>>>>>>
>>>>>>>>>> ------ Part 1 of the registration record is in the
>>>>>>>>>> mandatory/ASCII7--
>>>>>>>> ---
>>>>>>>>>> Domain ID:D2347548-LROR
>>>>>>>>>> Domain Name:ROCK.STAR
>>>>>>>>>> Created On:14-Sep-1998 04:00:00 UTC
>>>>>>>>>> Last Updated On:26-Mar-2010 15:12:28 UTC Expiration Date:07-Dec-
>>>> 2012
>>>>>>>>>> 17:04:26 UTC Sponsoring Registrar:GoDaddy.com, Inc. (R91-LROR)
>>>>>>>>>> Status:CLIENT DELETE PROHIBITED Status:CLIENT RENEW PROHIBITED
>>>>>>>>>> Status:CLIENT TRANSFER PROHIBITED Status:CLIENT UPDATE PROHIBITED
>>>>>>>>>> Status:DELETE PROHIBITED Status:RENEW PROHIBITED Status:TRANSFER
>>>>>>>>>> PROHIBITED Status:UPDATE PROHIBITED Registrant ID:CR12376439
>>>>>>>>>> Registrant Name:Administrator Registrant Organization:Chocolate
>>>>>>>>>> Factory Registrant Street1:Cherry Lane Registrant Street2:
>>>>>>>>>> Registrant Street3:
>>>>>>>>>> Registrant City:Moscow
>>>>>>>>>> Registrant State/Province:
>>>>>>>>>> Registrant Postal Code:
>>>>>>>>>> Registrant Country:Russia
>>>>>>>>>> Registrant Phone:+7.922.555.1234
>>>>>>>>>> Registrant Phone Ext.:
>>>>>>>>>> Registrant FAX:+7.922.555.1235
>>>>>>>>>> Registrant FAX Ext.:
>>>>>>>>>> Registrant Email:administrator@xxxxxxxxx Admin ID:CR12376441 Admin
>>>>>>>>>> Name:Administrator Admin Organization:Chocolate Factory Admin
>>>>>>>>>> Street1:Cherry Lane Admin Street2:
>>>>>>>>>> Admin Street3:
>>>>>>>>>> Admin City:Moscow
>>>>>>>>>> Admin State/Province:
>>>>>>>>>> Admin Postal Code:
>>>>>>>>>> Admin Country:Russia
>>>>>>>>>> Admin Phone:+7.922.555.1234
>>>>>>>>>> Admin Phone Ext.:
>>>>>>>>>> Admin FAX:+7.922.555.1235
>>>>>>>>>> Admin FAX Ext.:
>>>>>>>>>> Admin Email:administrator@xxxxxxxxx
>>>>>>>>>> Tech ID:CR12376440
>>>>>>>>>> Tech Name:Administrator
>>>>>>>>>> Tech Organization:Chocolate Factory
>>>>>>>>>> Tech Street1:Cherry Lane
>>>>>>>>>> Tech Street2:
>>>>>>>>>> Tech Street3:
>>>>>>>>>> Tech City:Moscow
>>>>>>>>>> Tech State/Province:
>>>>>>>>>> Tech Postal Code:
>>>>>>>>>> Tech Country: Russia
>>>>>>>>>> Tech Phone:+7.922.555.1234
>>>>>>>>>> Tech Phone Ext.:
>>>>>>>>>> Tech FAX:+7.922.555.1235
>>>>>>>>>> Tech FAX Ext.:
>>>>>>>>>> Tech Email:domain-admin@xxxxxxxxx
>>>>>>>>>> Name Server:NS1.rock.star
>>>>>>>>>> Name Server:NS2.rock.star
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> DNSSEC:Signed
>>>>>>>>>> DS Created:26-Mar-2010 15:12:06 UTC
>>>>>>>>>> DS Maximum Signature Life 1:3456000 seconds DS Created:26-Mar-2010
>>>>>>>>>> 15:12:28 UTC DS Maximum Signature Life 2:3456000 seconds
>>>>>>>>>> ------- Part 2 of the record is in a character set the registrant
>>>>>>>>>> Domain ID:D2347548-LROR Domain Name: утес.звезда Created
>>>>>>>>>> On:14-Sep-1998 04:00:00 UTC Last Updated On:26-Mar-2010 15:12:28
>>>> UTC
>>>>>>>>>> Expiration Date:07-Dec-2012 17:04:26 UTC Sponsoring
>>>>>>>>>> Registrar:GoDaddy.com, Inc. (R91-LROR) Status:CLIENT DELETE
>>>>>>>>>> PROHIBITED Status:CLIENT RENEW PROHIBITED Status:CLIENT TRANSFER
>>>>>>>>>> PROHIBITED Status:CLIENT UPDATE PROHIBITED Status:DELETE PROHIBITED
>>>>>>>>>> Status:RENEW PROHIBITED Status:TRANSFER PROHIBITED Status:UPDATE
>>>>>>>>>> PROHIBITED Registrant ID:CR12376439 Registrant Name:администратор
>>>>>>>>>> Registrant Organization: Фабрика шоколада Registrant Street1: майна
>>>>>>>>>> вишни Registrant Street2:
>>>>>>>>>> Registrant Street3:
>>>>>>>>>> Registrant City: Москва
>>>>>>>>>> Registrant State/Province:
>>>>>>>>>> Registrant Postal Code:
>>>>>>>>>> Registrant Country: Россия
>>>>>>>>>> Registrant Phone:+7.922.555.1234
>>>>>>>>>> Registrant Phone Ext.:
>>>>>>>>>> Registrant FAX:+7.922.555.1235
>>>>>>>>>> Registrant FAX Ext.:
>>>>>>>>>> Registrant Email: администратор@утес.звезда Admin ID:CR12376441
>>>>>>>>>> Admin Name: администратор Admin Organization: Фабрика шоколада
>>>> Admin
>>>>>>>>>> Street1: майна вишни Admin Street2:
>>>>>>>>>> Admin Street3:
>>>>>>>>>> Admin City: Москва
>>>>>>>>>> Admin State/Province:
>>>>>>>>>> Admin Postal Code:
>>>>>>>>>> Admin Country: Россия
>>>>>>>>>> Admin Phone:+7.922.555.1234
>>>>>>>>>> Admin Phone Ext.:
>>>>>>>>>> Admin FAX:+7.922.555.1235
>>>>>>>>>> Admin FAX Ext.:
>>>>>>>>>> Admin Email: администратор@утес.звезда Tech ID:CR12376440 Tech
>>>> Name:
>>>>>>>>>> администратор Tech Organization: Фабрика шоколада Tech Street1:
>>>>>>>>>> майна вишни Tech Street2:
>>>>>>>>>> Tech Street3:
>>>>>>>>>> Tech City: Москва
>>>>>>>>>> Tech State/Province:
>>>>>>>>>> Tech Postal Code:
>>>>>>>>>> Tech Country: Россия
>>>>>>>>>> Tech Phone:+7.922.555.1234
>>>>>>>>>> Tech Phone Ext.:
>>>>>>>>>> Tech FAX:+7.922.555.1235
>>>>>>>>>> Tech FAX Ext.:
>>>>>>>>>> Tech Email: администратор@утес.звезда Name Server: ns1.утес.звезда
>>>>>>>>>> Name Server:ns2.утес.звезда Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> Name Server:
>>>>>>>>>> DNSSEC:Signed
>>>>>>>>>> DS Created:26-Mar-2010 15:12:06 UTC
>>>>>>>>>> DS Maximum Signature Life 1:3456000 seconds DS Created:26-Mar-2010
>>>>>>>>>> 15:12:28 UTC DS Maximum Signature Life 2:3456000 seconds
>>>>>>
>>>>>>
>>>>>>
>>>>
>>
>>
>>
>>





<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy