ICANN ICANN Email List Archives

[ssac-gnso-irdwg]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [ssac-gnso-irdwg] Technical issues of internationalization

  • To: Edmon Chung <edmon@xxxxxxxxxxxxx>
  • Subject: Re: [ssac-gnso-irdwg] Technical issues of internationalization
  • From: Jay Daley <jay@xxxxxxxxxxx>
  • Date: Thu, 4 Mar 2010 14:08:53 +1300

On 1/03/2010, at 4:13 PM, Edmon Chung wrote:

> 1. if we would suggest allowing submission of UTF8 or 16 to the WHOIS server, 
> we would probably have to change the protocol, and from which we would need 
> to specify some mechanism for the client to identify itself as sending UTF8

A number of WHOIS servers provided by ccTLDs already have parameters to control 
output that could presumably also control interpretation of input.  Some 
examples:

1.  dk

$ whois -h whois.nic.dk HELP
[...]
# Query syntax:
#   [<options>] <query_string>
# Available options:
#   --charset=<charset>
#   --accesscode=<accesscode>[:<accesscode>[:<accesscode>]]
#   --show-handles
# Available charsets:
#   latin-1 also known as iso-8859-1 (default)
#   utf-8
# Example:
#   --charset=latin-1 dk-hostmaster.dk
#   --accesscode=C8850DF92ECB6CF581EF6C8FD31C1CDF dk-hostmaster.dk
# Hint:
#   Most Unix whois clients have problems with these options and tries
#   to parse them themselves. To get around this, do lookups like this:
#     whois " --charset=latin-1 dk-hostmaster.dk"
#   Note the additional space after the first quote.

2.  .no

For .no (and the SLDs managed by the registry) you can specify '-c utf-8' in 
front of the query to have the result returned encoded in UTF-8.  The default 
encoding is latin1/iso-8859-1 (or us-ascii if the result will fit into that 
encoding).

3. .jp

For JP domain Whois, Japanese (encoded with ISO-2022-JP) is included in its 
output by default.  To suppress Japanese output, append /e to the end of the 
query string.

If /e is used with IDN, the output is A-Label (Punycode) only.

4. .de

Standard data output in UTF-8. Other character sets can be specified via the 
flag "-C (Charset)".
Queries can be made either for the IDN domain (e.g.: đäňîċ-ţåŝŧďómãĵņ.de / 
denic.de) or, using an additional flag, for the ACE form (e.g.: xn--dnic-loa.de 
/ denic.de). (only in case of .de domain queries)
For non-ASCII domains, the output always includes the IDN domain ("Domain:") as 
well as the ACE form ("Domain-ace:"). (only in case of .de domain queries)


Jay

>  
> 2. it would be somewhat impossible to distinguish definitively whether an 
> incoming query is in a particular encoding (especially given such a short 
> string), so it is probably not reasonable for the server to "interpret" it
>  
> Edmon
>  
>  
>  
> From: owner-ssac-gnso-irdwg@xxxxxxxxx 
> [mailto:owner-ssac-gnso-irdwg@xxxxxxxxx] On Behalf Of Steve Sheng
> Sent: Saturday, February 27, 2010 4:17 AM
> To: Steve Sheng; Ird
> Subject: [ssac-gnso-irdwg] Technical issues of internationalization
>  
> Hi all, I thought of an question and want to raise it here.  
> 
> Currently Whois terminal clients may use specific encodings (e.g. GB2312 for 
> simplified Chinese, Big5 for traditional Chinese, etc) instead of UTF-8 or 
> UTF-16. So what happens when a user submit a U-label domain name query in 
> Big5 or GB2312? Should we expect the corresponding server to be able to 
> interpret it? How would the Whois server know what encoding the client’s 
> submission is in? 
> 
>  
> Warmly,  
> Steve
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.733 / Virus Database: 271.1.1/2704 - Release Date: 02/26/10 
> 15:34:00
> 


-- 
Jay Daley
Chief Executive
.nz Registry Services (New Zealand Domain Name Registry Limited)
desk: +64 4 931 6977
mobile: +64 21 678840



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy