ICANN ICANN Email List Archives

[idn-guidelines]


<<< Chronological Index >>>    <<< Thread Index >>>

Comments regarding the draft Guidelines for the Implementation of Internationalized Domain Names, v2.0

  • To: idn-guidelines@xxxxxxxxx
  • Subject: Comments regarding the draft Guidelines for the Implementation of Internationalized Domain Names, v2.0
  • From: Neil Harris <neil.harris@xxxxxxxxxxxxxxxxxxx>
  • Date: Tue, 11 Oct 2005 16:44:34 +0100

Dear ICANN members,

I am encouraged by the work being carried out on as part of the process
of preparing the next generation of the Guidelines for the
Implementation of Internationalized Domain Names, in particular with
regard to the IDN homograph-spoofing and invalid characters issues. I
feel that the multi-pronged approach being taken in the current draft
has the potential to be effective, if pursued with sufficient vigor.

I would like to comment on the proposition that some character ranges
should be blacklisted entirely. I approve wholeheartedly with this
principle, as any reduction of the character repertoire will greatly
ease the effort of looking for homographs.

I believe that a precautionary principle should be used, where many
ranges of characters that are not useful for label creation should be
blacklisted by default. I propose the following characters for blacklisting:

1: All ASCII characters other than letters, digits and HYPHEN-MINUS

Rationale: These characters are not allowed in RFC 1035, and their
addition will serve no purpose for internationalization purposes. In
addition, many of them have special meanings to computer programs, for
example, as part of E-mail addresses, URLs, or as quote characters in
various systems such as SQL queries or command-line interpreters.
Disallowing these characters will reduce the possiblity of attacks on
other protocols by their inclusion in domain names.

2: Any character which, after NAMEPREP processing, generates an ASCII
character other than letters, digits or HYPHEN-MINUS

Rationale: these characters can potentially be expanded by software to
ASCII punctuation characters during IDN processing and be passed through
to lower-level DNS calls, allowing spoofing risks as outlined above.
There are too many of these characters to list here.

In particular, these should include:

2a: Visual spoofs of SOLIDUS:

U+0337 COMBINING SHORT SOLIDUS OVERLAY
U+0338 COMBINING LONG SOLIDUS OVERLAY
U+2044 FRACTION SLASH
U+2215 DIVISION SLASH
U+23AE INTEGRAL EXTENSION
U+29F6 SOLIDUS WITH OVERBAR
U+29F8 BIG SOLIDUS
U+2AFB TRIPLE SOLIDUS BINARY RELATION
U+2AFD DOUBLE SOLIDUS OPERATOR
U+FF0F FULLWIDTH SOLIDUS
U+3033 VERTICAL KANA REPEAT MARK UPPER HALF

2b: Visual spoofs of FULL STOP and other label separators, which should
never appear in a label:

U+2024 ONE DOT LEADER
U+2027 HYPHENATION POINT
U+06D4 ARABIC FULL STOP
U+0702 SYRIAC SUBLINEAR FULL STOP
U+3002 IDEOGRAPHIC FULL STOP
U+FF0E FULLWIDTH FULL STOP
U+FF61 HALFWIDTH IDEOGRAPHIC FULL STOP

3: Any character that is a visual spoof of any ASCII character other
than letters or digits, or appears to contain a visual spoof of one of
these characters, which is not detected by their presence in NAMEPREP
output.

Rationale: Whilst these characters are not a danger to software, they
can be used to create confusion in users, for example, by creating URLs
that mislead users into thinking that they are visiting a different
website to the one given.

A partial list of these characters is given at the end of this E-mail.

4: All characters labeled as Non-XID, Pattern_Syntax,  IDN-Illegal or
IDN-Deleted in http://www.unicode.org/reports/tr36/idn-chars.html

Rationale: these are either not useful for creating names or
identifiers, or explicity violate IDN guidelines.

5: Spacing and filler characters, namely

U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+115F HANGUL CHOSEONG FILLER
U+1160 HANGUL JUNGSEOUNG FILLER
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+200B ZERO WIDTH SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE
U+3164 HANGUL FILLER
U+FEFF ZERO WIDTH NO-BREAK SPACE
U+FFA0 HALFWIDTH HANGUL FILLER

Rationale: Existing domain names cannot contain spaces. There is no
practical reason for IDNs to contain spaces. In addition, some filler
characters break some rendering engines, allowing to typograpic spoofing
attacks.

6: Line separators

U+2028 LINE SEPARATOR
U+2029 PARAGRAPH SEPARATOR

Rationale: Neither of these can serve any practical purpose in a domain
name.

7: Ideographic description characters, namely
U+2FF0 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO RIGHT
U+2FF1 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW
U+2FF2 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT TO MIDDLE AND RIGHT
U+2FF3 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW
U+2FF4 IDEOGRAPHIC DESCRIPTION CHARACTER FULL SURROUND
U+2FF5 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE
U+2FF6 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM BELOW
U+2FF7 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT
U+2FF8 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER LEFT
U+2FF9 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT
U+2FFA IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LOWER LEFT
U+2FFB IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID

Rationale: Unless CJK experts disagree, I cannot see these as being
useful for IDNs, as
* most rendering engines cannot use them to render glyphs,
* if used as rendering hints, they allow multiple descriptions of the
same ideograph, allowing multiple Unicode strings to map to the same
visual appearance, which is a spoofing risk
* their use as printing characters, if not supported as rendering hints,
is confusing, and again allows multiple representations of the same ideogram

8: All characters from all scripts described as "Ancient Scripts" in the
Unicode code charts, such as Old Italic, Cuneiform, Old Persian,
Ugaritic, Linear B Syllabary, Linear B Ideograms, Aegean Numbers, the
Cypriot Syllabary, Gothic, and Runic.

Rationale: there appears to be no utility in creating names in dead
languages.

9: Presentation forms from all scripts.

Rationale:  as I understand it, presentation forms are only required for
compatibility reasons, and provide alternative ways of representing the
same characters that can be represented in other ways using standard
Unicode characters. Even though some of these may be correctly dealt
with by NAMEPREP, having more than one way of representing the same
thing is generally a bad idea.

10: Musical symbols, currency symbols, specials, tags, layout controls,
variation selectors.

Rationale: these characters seem unnecessary for use in names.

11: All private use characters, surrogates, and noncharacters.

Rationale: these characters are unusable in names.

I am aware that the ranges given here will probably overlap with one
another, and may duplicate other lists of deprecated characters, such as
those in UTR #36. Nevertheless, I believe that the precautionary
principle suggests that these characters should be universally
blacklisted in domain name labels, with specific exceptions if justified
on a case-by-case basis for linguistic reasons.

I hope this input is useful as part of the consultation process.

Sincerely,

Neil Harris
Media Channel Limited

------------------------------------------------------------------------------

List of characters in class 3 above:

U+01C3 ; LATIN LETTER RETROFLEX CLICK ; -> EXCLAMATION MARK; [nameprep: LATIN LETTER RETROFLEX CLICK]
U+05C3 ; HEBREW PUNCTUATION SOF PASUQ ; -> COLON; [nameprep: HEBREW PUNCTUATION SOF PASUQ]
U+05F4 ; HEBREW PUNCTUATION GERSHAYIM ; -> QUOTATION MARK; [nameprep: HEBREW PUNCTUATION GERSHAYIM]
U+321D ; PARENTHESIZED KOREAN CHARACTER OJEON ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE JEON, RIGHT PARENTHESIS; [nameprep: ch321D]
U+321E ; PARENTHESIZED KOREAN CHARACTER O HU ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE HU, RIGHT PARENTHESIS; [nameprep: ch321E]
U+01C3 ; LATIN LETTER RETROFLEX CLICK ; -> EXCLAMATION MARK; [nameprep: LATIN LETTER RETROFLEX CLICK]
U+05C3 ; HEBREW PUNCTUATION SOF PASUQ ; -> COLON; [nameprep: HEBREW PUNCTUATION SOF PASUQ]
U+05F4 ; HEBREW PUNCTUATION GERSHAYIM ; -> QUOTATION MARK; [nameprep: HEBREW PUNCTUATION GERSHAYIM]
U+321D ; PARENTHESIZED KOREAN CHARACTER OJEON ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE JEON, RIGHT PARENTHESIS; [nameprep: ch321D]
U+321E ; PARENTHESIZED KOREAN CHARACTER O HU ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE HU, RIGHT PARENTHESIS; [nameprep: ch321E]
U+00BD ; VULGAR FRACTION ONE HALF ; -> LATIN SMALL LETTER L, SOLIDUS, DIGIT TWO; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT TWO]
U+01C3 ; LATIN LETTER RETROFLEX CLICK ; -> EXCLAMATION MARK; [nameprep: LATIN LETTER RETROFLEX CLICK]
U+2039 ; SINGLE LEFT-POINTING ANGLE QUOTATION MARK ; -> LESS-THAN SIGN; [nameprep: SINGLE LEFT-POINTING ANGLE QUOTATION MARK]
U+203A ; SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ; -> GREATER-THAN SIGN; [nameprep: SINGLE RIGHT-POINTING ANGLE QUOTATION MARK]
U+2044 ; FRACTION SLASH ; -> SOLIDUS; [nameprep: FRACTION SLASH]
U+2154 ; VULGAR FRACTION TWO THIRDS ; -> DIGIT TWO, SOLIDUS, DIGIT THREE; [nameprep: DIGIT TWO, FRACTION SLASH, DIGIT THREE]
U+2155 ; VULGAR FRACTION ONE FIFTH ; -> LATIN SMALL LETTER L, SOLIDUS, DIGIT FIVE; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT FIVE]
U+2156 ; VULGAR FRACTION TWO FIFTHS ; -> DIGIT TWO, SOLIDUS, DIGIT FIVE; [nameprep: DIGIT TWO, FRACTION SLASH, DIGIT FIVE]
U+2159 ; VULGAR FRACTION ONE SIXTH ; -> LATIN SMALL LETTER L, SOLIDUS, CYRILLIC SMALL LETTER BE; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT SIX]
U+215A ; VULGAR FRACTION FIVE SIXTHS ; -> DIGIT FIVE, SOLIDUS, CYRILLIC SMALL LETTER BE; [nameprep: DIGIT FIVE, FRACTION SLASH, DIGIT SIX]
U+215B ; VULGAR FRACTION ONE EIGHTH ; -> LATIN SMALL LETTER L, SOLIDUS, GURMUKHI DIGIT FOUR; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT EIGHT]
U+2215 ; DIVISION SLASH ; -> SOLIDUS; [nameprep: DIVISION SLASH]
U+3015 ; RIGHT TORTOISE SHELL BRACKET ; -> RIGHT SQUARE BRACKET; [nameprep: RIGHT TORTOISE SHELL BRACKET]
U+321D ; PARENTHESIZED KOREAN CHARACTER OJEON ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE JEON, RIGHT PARENTHESIS; [nameprep: ch321D]
U+321E ; PARENTHESIZED KOREAN CHARACTER O HU ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE HU, RIGHT PARENTHESIS; [nameprep: ch321E]
U+33AE ; SQUARE RAD OVER S ; -> LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, SOLIDUS, LATIN SMALL LETTER TONE FIVE; [nameprep: LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, DIVISION SLASH, LATIN SMALL LETTER S]
U+33AF ; SQUARE RAD OVER S SQUARED ; -> LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, SOLIDUS, LATIN SMALL LETTER TONE FIVE, DIGIT TWO; [nameprep: LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, DIVISION SLASH, LATIN SMALL LETTER S, DIGIT TWO]
U+FE14 ; PRESENTATION FORM FOR VERTICAL SEMICOLON ; -> SEMICOLON; [nameprep: chFE14]
U+FE15 ; PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK ; -> EXCLAMATION MARK; [nameprep: chFE15]
U+FE3F ; PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET ; -> CIRCUMFLEX ACCENT; [nameprep: LEFT ANGLE BRACKET]
U+FE5E ; SMALL RIGHT TORTOISE SHELL BRACKET ; -> RIGHT SQUARE BRACKET; [nameprep: RIGHT TORTOISE SHELL BRACKET]
U+00BC ; VULGAR FRACTION ONE QUARTER ; -> DIGIT ONE, SOLIDUS, CHEROKEE LETTER SE; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT FOUR]
U+00BD ; VULGAR FRACTION ONE HALF ; -> DIGIT ONE, SOLIDUS, DIGIT TWO; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT TWO]
U+01C3 ; LATIN LETTER RETROFLEX CLICK ; -> EXCLAMATION MARK; [nameprep: LATIN LETTER RETROFLEX CLICK]
U+2039 ; SINGLE LEFT-POINTING ANGLE QUOTATION MARK ; -> LESS-THAN SIGN; [nameprep: SINGLE LEFT-POINTING ANGLE QUOTATION MARK]
U+203A ; SINGLE RIGHT-POINTING ANGLE QUOTATION MARK ; -> GREATER-THAN SIGN; [nameprep: SINGLE RIGHT-POINTING ANGLE QUOTATION MARK]
U+2044 ; FRACTION SLASH ; -> SOLIDUS; [nameprep: FRACTION SLASH]
U+2154 ; VULGAR FRACTION TWO THIRDS ; -> DIGIT TWO, SOLIDUS, DIGIT THREE; [nameprep: DIGIT TWO, FRACTION SLASH, DIGIT THREE]
U+2155 ; VULGAR FRACTION ONE FIFTH ; -> DIGIT ONE, SOLIDUS, DIGIT FIVE; [nameprep: DIGIT ONE, FRACTION SLASH, DIGIT FIVE]
U+2156 ; VULGAR FRACTION TWO FIFTHS ; -> DIGIT TWO, SOLIDUS, DIGIT FIVE; [nameprep: DIGIT TWO, FRACTION SLASH, DIGIT FIVE]
U+215A ; VULGAR FRACTION FIVE SIXTHS ; -> DIGIT FIVE, SOLIDUS, CYRILLIC SMALL LETTER BE; [nameprep: DIGIT FIVE, FRACTION SLASH, DIGIT SIX]
U+215F ; FRACTION NUMERATOR ONE ; -> DIGIT ONE, SOLIDUS; [nameprep: DIGIT ONE, FRACTION SLASH]
U+2215 ; DIVISION SLASH ; -> SOLIDUS; [nameprep: DIVISION SLASH]
U+3015 ; RIGHT TORTOISE SHELL BRACKET ; -> RIGHT SQUARE BRACKET; [nameprep: RIGHT TORTOISE SHELL BRACKET]
U+321D ; PARENTHESIZED KOREAN CHARACTER OJEON ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE JEON, RIGHT PARENTHESIS; [nameprep: ch321D]
U+321E ; PARENTHESIZED KOREAN CHARACTER O HU ; -> LEFT PARENTHESIS, HANGUL SYLLABLE O, HANGUL SYLLABLE HU, RIGHT PARENTHESIS; [nameprep: ch321E]
U+33AE ; SQUARE RAD OVER S ; -> LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, SOLIDUS, LATIN SMALL LETTER TONE FIVE; [nameprep: LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, DIVISION SLASH, LATIN SMALL LETTER S]
U+33AF ; SQUARE RAD OVER S SQUARED ; -> LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, SOLIDUS, LATIN SMALL LETTER TONE FIVE, DIGIT TWO; [nameprep: LATIN SMALL LETTER R, LATIN SMALL LETTER A, LATIN SMALL LETTER D, DIVISION SLASH, LATIN SMALL LETTER S, DIGIT TWO]
U+33C6 ; SQUARE C OVER KG ; -> CHEROKEE LETTER TLI, SOLIDUS, LATIN SMALL LETTER K, LATIN SMALL LETTER G; [nameprep: LATIN SMALL LETTER C, DIVISION SLASH, LATIN SMALL LETTER K, LATIN SMALL LETTER G]
U+33DF ; SQUARE A OVER M ; -> CANADIAN SYLLABICS CARRIER GHO, SOLIDUS, CANADIAN SYLLABICS CARRIER GO; [nameprep: ch33DF]
U+FE14 ; PRESENTATION FORM FOR VERTICAL SEMICOLON ; -> SEMICOLON; [nameprep: chFE14]
U+FE15 ; PRESENTATION FORM FOR VERTICAL EXCLAMATION MARK ; -> EXCLAMATION MARK; [nameprep: chFE15]
U+FE3F ; PRESENTATION FORM FOR VERTICAL LEFT ANGLE BRACKET ; -> CIRCUMFLEX ACCENT; [nameprep: LEFT ANGLE BRACKET]
U+FE5E ; SMALL RIGHT TORTOISE SHELL BRACKET ; -> RIGHT SQUARE BRACKET; [nameprep: RIGHT TORTOISE SHELL BRACKET]


and, for good measure,

Missing symmetry pairs from the above, just in case
U+3014 LEFT TORTOISE SHELL BRACKET
U+FE5D SMALL LEFT TORTOISE SHELL BRACKET





<http://www.unicode.org/reports/tr31/tr31-5.html>



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy