<<<
Chronological Index
>>> <<<
Thread Index
>>>
Re: [ssac-gnso-irdwg] Technical issues of internationalization
- To: Edmon Chung <edmon@xxxxxxxxxxxxx>
- Subject: Re: [ssac-gnso-irdwg] Technical issues of internationalization
- From: Jay Daley <jay@xxxxxxxxxxx>
- Date: Thu, 4 Mar 2010 14:08:53 +1300
On 1/03/2010, at 4:13 PM, Edmon Chung wrote:
> 1. if we would suggest allowing submission of UTF8 or 16 to the WHOIS server,
> we would probably have to change the protocol, and from which we would need
> to specify some mechanism for the client to identify itself as sending UTF8
A number of WHOIS servers provided by ccTLDs already have parameters to control
output that could presumably also control interpretation of input. Some
examples:
1. dk
$ whois -h whois.nic.dk HELP
[...]
# Query syntax:
# [<options>] <query_string>
# Available options:
# --charset=<charset>
# --accesscode=<accesscode>[:<accesscode>[:<accesscode>]]
# --show-handles
# Available charsets:
# latin-1 also known as iso-8859-1 (default)
# utf-8
# Example:
# --charset=latin-1 dk-hostmaster.dk
# --accesscode=C8850DF92ECB6CF581EF6C8FD31C1CDF dk-hostmaster.dk
# Hint:
# Most Unix whois clients have problems with these options and tries
# to parse them themselves. To get around this, do lookups like this:
# whois " --charset=latin-1 dk-hostmaster.dk"
# Note the additional space after the first quote.
2. .no
For .no (and the SLDs managed by the registry) you can specify '-c utf-8' in
front of the query to have the result returned encoded in UTF-8. The default
encoding is latin1/iso-8859-1 (or us-ascii if the result will fit into that
encoding).
3. .jp
For JP domain Whois, Japanese (encoded with ISO-2022-JP) is included in its
output by default. To suppress Japanese output, append /e to the end of the
query string.
If /e is used with IDN, the output is A-Label (Punycode) only.
4. .de
Standard data output in UTF-8. Other character sets can be specified via the
flag "-C (Charset)".
Queries can be made either for the IDN domain (e.g.: đäňîċ-ţåŝŧďómãĵņ.de /
denic.de) or, using an additional flag, for the ACE form (e.g.: xn--dnic-loa.de
/ denic.de). (only in case of .de domain queries)
For non-ASCII domains, the output always includes the IDN domain ("Domain:") as
well as the ACE form ("Domain-ace:"). (only in case of .de domain queries)
Jay
>
> 2. it would be somewhat impossible to distinguish definitively whether an
> incoming query is in a particular encoding (especially given such a short
> string), so it is probably not reasonable for the server to "interpret" it
>
> Edmon
>
>
>
> From: owner-ssac-gnso-irdwg@xxxxxxxxx
> [mailto:owner-ssac-gnso-irdwg@xxxxxxxxx] On Behalf Of Steve Sheng
> Sent: Saturday, February 27, 2010 4:17 AM
> To: Steve Sheng; Ird
> Subject: [ssac-gnso-irdwg] Technical issues of internationalization
>
> Hi all, I thought of an question and want to raise it here.
>
> Currently Whois terminal clients may use specific encodings (e.g. GB2312 for
> simplified Chinese, Big5 for traditional Chinese, etc) instead of UTF-8 or
> UTF-16. So what happens when a user submit a U-label domain name query in
> Big5 or GB2312? Should we expect the corresponding server to be able to
> interpret it? How would the Whois server know what encoding the client’s
> submission is in?
>
>
> Warmly,
> Steve
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.733 / Virus Database: 271.1.1/2704 - Release Date: 02/26/10
> 15:34:00
>
--
Jay Daley
Chief Executive
.nz Registry Services (New Zealand Domain Name Registry Limited)
desk: +64 4 931 6977
mobile: +64 21 678840
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|