Comments on the Draft Revised IDN Guidelines
I am writing to comment on the Draft Revised IDN Guidelines: http://icann.org/general/idn-guidelines-20sep05.htm
Firstly, I would like to thank ICANN for taking a lead in tackling the potential spoofing problems that IDN entails, and in updating the guidelines. Overall, I am very pleased with the new draft, which represents a significant improvement over version 1.0.
From the point of view of the Mozilla Foundation, we would like the final draft document to be of sufficient clarity, extent and watertightness that we can simply say to registries "Do you follow all of the ICANN guidelines? If so, show us your documentation to prove it and we'll enable IDN for your TLD." I would hope that ICANN also has this aim - that is, that the guidelines should encapsulate the whole of current best practice in avoiding spoofing issues.
Let me focus my comments in by saying that I am in complete agreement with every guideline apart from guidelines 3) and 4). I think I am also in agreement with section 3), but there are some parts where I would seek clarification, and other parts where I think the guidelines do not go quite far enough. I will now elaborate on what I mean.
Guideline 3) ------------
Reading Guideline 3 raises the following questions:
a) Who defines a "set of languages"? If the registries define them, what is to prevent a registry defining a set which contains every existing language, and therefore bypassing the intent of many of the guidelines? Is the number of necessary sets small enough to enumerate them in the document, as UTR #36 does?
b) The following sentence is confusing to me:
"Visually confusable characters from different scripts may not appear in a single *label* unless there are overriding legitimate linguistic reasons for doing so."
Let's say that "b" and "6" are confusable, just for the sake of example. Taking this at face value, it says that I can't have a label like business60.com, because it has both a b and a 6 in it. This seems an odd thing to explicitly prohibit; surely the risk is not in having "business60.com", but in having both of "business60.com" and 6usinessb0.com" registered to different people?
Perhaps where *label* was written (highlighted above), you meant "table"? That would turn it into a very sensible restriction which made sense in the context.
c) As you will know, browsers do not have access to character tables or script labels or any of this ancillary information about labels, and no-one (so far as I know) is proposing any mechanisms for them to have access to it. Therefore, any registry policy designed to prevent spoofing needs to be blind to the existence of character tables, even if they are used as a way of limiting registrations.
What I am saying here is that the current policy does not address whole-script spoofables (e.g. caxap.tld in Latin and caxap.tld in Cyrillic). If the .tld registry has a table for both Russian and English, nothing I can see in the guidelines tells them they need to make sure these two domains are not registered to different entities. In terms of the "safety" of these guidelines, this is my main concern.
d) Following on from the above, I believe it's important for transparency reasons for people (including browser manufacturers) to be able easily to see what characters a registry is permitting. Therefore, up to this point, we have been requesting, from every registry we enable IDN for, a single ordered list of all characters they permit, full stop. This enables us to see if there are any homographs; if there are, we can then further analyse their tables to see if they will be a problem in practice. I would suggest that ICANN make the production and publication of such a list a guideline in the same way that production of individual tables is. One is merely a reformatting and agglomeration of the other, so the additional work should not be great.
Guideline 4) ------------
ETHIOPIC WORDSPACE (U+1361) is classified here as "necessary punctuation". This character is a homograph of : (colon), which is a "character with a well-established function as a protocol element" - a set later prohibited in the same guideline, for very good reasons.
Without knowing the exact use of the character in Ethiopic, I don't want to be dogmatic about this, but I'm concerned about it being classified explicitly by the guidelines as "necessary punctuation", and even about the existence of such a class as "necessary punctuation". ASCII domain names have coped for many years using only hyphen (and CamelCase, presentationally) as a separator. I think the real necessity of any punctuation, particularly that which spoofs protocol elements, needs extremely careful examination.
Additionally, I am also hoping that IETF processes will eventually lead to a revision of the IDN guidelines which use an inclusive approach, and which focus almost exclusively on letters and numbers in the various scripts. There is a danger that their view of "necessary punctuation" might not agree with the one in the guidelines.
Secondly, this guideline says that "such-and-such is not allowed", but then says any registry may make exceptions merely by documenting them. This rather removes any force the guideline may have had. While I know we can never see the future clearly, what sort of exceptions are envisaged, and why should we be allowing them, given the wiseness of prohibiting all the character classes explicitly listed in Guideline 4?
Thank you for the opportunity to comment on this draft; I hope you find my comments helpful, and I look forward to hearing any feedback you may have on them, and to seeing a further draft of the guidelines.