Comments on 20 September draft revised IDN guidelines
Despite considerable interest in issues associated with IDNs, have refrained from commenting on this draft until now for three reasons. First, I have wanted to understand how other comments were running. Second, and more important, I do not believe that the current draft for version 2.0 represents the right direction for the Internet community and have been trying to figure out how to address that issue. Third, I have gradually lost most of the confidence I once had in the quality of ICANN's review and approval processes, so I am not sure that writing these notes is worth the trouble. So, while I applaud the efforts of the members of the Guidelines committee to put this draft together and of ICANN in finally initiating some real work on IDNs after years of promises, I am deeply concerned about this draft and its contents. ------------------------- Comments on "Guidelines for the Implementation of Internationalized Domain Names, Draft Version 2.0", dated 20 September 2005. (1) Parts of the draft have the wrong tone and may represent the wrong approach. It is clearly in the best interest of the Internet community that all domains -- gTLDs, ccTLDs, and domains at the second level and beyond -- have consistent policies with regard to IDNs. If we do not have consistent policies, then users do not know what to expect. If policies are not consistent, reasonable protection for users encourages --some would say "forces"-- applications implementers to develop per-domain profiles or lists of domains with acceptable or unacceptable behavior. We have, of course, already seen that response. Per-domain profiles, different IDNA tables used in different applications, and similar behavior, no matter how well-intentioned (or even necessary) further damages user ability to predict how a particular program will behave and reduces compliance with the relevant standards. Full compliance with IDNA requires that applications display IDNs in fonts and glyphs appropriate to the relevant characters if it is possible to do so. If a user obtains an application or operating system that is fully Unicode-capable and that has a full set of fonts, that user should, given the standard, be able to expect that he or she will never see a "punycode"-encoded name. Certainly such names should not be seen on an arbitrary basis or on a basis that is linked to particular domain names at the top levels of the tree. But that is exactly where we have found ourselves with web browsers as a result of the recent IDN-"phishing" problems and ICANN's unwillingness to address IDN policy and permitted character issues when they were first identified almost five years ago, while there was still time to address them in the context of the basic design of IDNA. This draft does not significantly improve on the present situation. In this area, and others, ICANN must decide what business it is in and how it expects to generate and, if appropriate enforce, policies. Where consistency across domains is important, as it is in this case, ICANN can, in principle, make policies applicable to the entire Internet community, insisting that conformance is a requirement of the long-standing provisions of RFC 1591 that all domain managers act as trustees for the global Internet community and act according to their responsibilities to that community. ICANN would also need to insist that domain managers carry out their obligations under the "recursive application" rule of RFC 1591, i.e., that they enforce requirements placed on them on all subdomain registrations they permit. The existence of that authority in principle is, however, meaningless without the will and ability to apply it, and apply it consistently, in practice. Based on the history of the last few years, including ICANN's interactions with other organizations and governmental entities, it seems unlikely that ICANN, in practice, has any authority in this area that can be exercised in a sufficiently global way to provide users with a consistent and safe experience. ICANN has an alternative which, in practice, would be more likely to serve the overall community well. That alternative is to carefully explain the issues, provide "best practices" guidelines for dealing with them and persuasive explanations of why those practices are appropriate and necessary, and clearly and logically identify the institutions that should have responsibility for various actions and controls. If that is done, then ICANN should be able to assert only the level of authority that it actually has in practice. It would then step back, in the presence of clear explanations of issues and alternatives, rely for enforcement on the good sense of domain administrators and managers, the workings of the marketplace, and the various governmental and judicial systems around the world. To accomplish that end, (a) Any document such as this one must clearly differentiate between requirements of the IDNA standard and recommendations or requirements imposed by ICANN or based on other community consensus. (b) Any document such as this one should avoid stating requirements in terms of "ICANN commands and everyone else will comply". That type of construction was one of the reasons why some of the provisions of the initial version of the Guidelines were widely ignored even when they were sensible. Instead, ICANN should state a recommendation, explain why that recommendation is important, and, ideally, explain the adverse consequences --in terms of Internet behavior as seen by registrants or users-- of not following it. (c) Documents such as this one should drop the pretense that gTLDs and ccTLDs are different from the standpoint of the Guidelines (the order in which IDNs should be deployed in different domains is a different issue). If ICANN has no practical enforcement capability in one case and no will to attempt to enforce policies in the other, there are no practical differences. Worse, making distinctions between "registries that have agreements with ICANN" and those which do not, and then imposing additional requirements on the former, represents poor strategic policy as it discourages those registries for whom reaching agreements with ICANN is voluntary from ever reaching such agreements. ------------------------------------------------ (2) Any set of rules or guidelines should make the locus of responsibility for specific implementations of the rules clear; this document does not do so. As discussed above, this draft is laden with language about what registries "will" or "must" do. Independent of where the authority to make or enforce such statements comes from, it is important to identify the reasons for those choices, rather than possible alternatives, better than these guidelines do. Even more important, there is an industry practice of passing all responsibility for problematic registrations from registry to registrar to registrant. This is reasonable for, e.g., trademark conflicts where it may be plausible to expect the would-be registrant to take responsibility for determining rights in the chosen name. However, there are IDN issues involving name conflicts and name similarities in which only the registry, by inspecting its own databases, can determine whether it is appropriate to register a name. For traditional, LDH, domain name labels, registry-level appropriateness typically involves only a determination of whether the label is already present. For IDNs, the necessary determination may involve understanding whether a visually-confusable label exists, or whether a label is not permitted due to an existing label group (variant set). If registries fail to establish and enforce effective conventions in those areas, and harm results, the responsibility must rest largely with the registry. ------------------------------------------------ (3) The draft does not go far enough to be significantly useful. Despite being stated as strong requirements, paragraphs 3 and 4 of the Guidelines do not provide any real guidance for marginal cases. Any rules of this type should start from a clear statement of the principle that the use of the DNS as a source of precise and unique identifiers for Internet objects is paramount. Without that as a primary principle, the network as users know it simply ceases to function. ICANN, and all domain managers, need to accept and understand this principle and understand that it may force banning the use of some strings as IDNs even if those string would be culturally and linguistically reasonable in some language considered by itself. As a trivial example for English-language strings, the use of space characters, commas, and periods is usually necessary to form sentences or phrases. Yet those characters have always been prohibited as part of domain name labels to be used in applications because they would cause too much confusion and too many risks to the integrity of DNS references. Similarly strong rules should be applied to the use of such characters, or any character that maps onto them, in IDNs, ideally with no exceptions. If, counter to whatever guidelines or "best practice" statements exist, registries make exceptions to such rules, they must bear complete responsibility for any negative consequences. Beyond that principle, Sections 3 and 4 indicate what code points may or may not be permissible by broad examples. That approach does not provide much guidance except to experts. While there may be many experts on a single language, there are few experts across all of the languages and scripts of the world. The approach of making specific prohibitions mentioned by Neil Harris in his posting about restrictions on 11 October appears to be a much more satisfactory method of dealing with these issues, and a better way to provide useful guidance, than by citing a few examples. ------------------------------------------------ (4) The draft focuses on the registration process, rather than on impact on actual implementations and users and user experiences. As I have indicated in other contexts, users do not generally use domain names. They use URLs or other URIs or IRIs, they use email addresses, and they may use other identifiers that incorporate domain names. The draft Guidelines identify one aspect of this issue in Section 5 but indicate only that the registry should "include in its documentation a description of the factors that determine the way that sequence appears at the user interface". I have no idea what that means; I would predict that the typical registry would not have a much better idea. That is not "guidance". But, more generally, DNS registries typically deal with the registration of single labels at a single level of the system. Several confusing situations can be introduced by sequences of labels, especially sequences in different scripts. However, suggestions to restrict the language or script of labels at one level of the DNS tree based on the language or script used at another level have generally proven infeasible due to other DNS constraints. The Guidelines, especially if they are to be titled as "Guidelines for Implementation" should either clearly address these issues or should avoid them entirely, pointing if appropriate to other documents and efforts. ------------------------------------------------ (5) The draft is internally confusing and creates new ambiguities. The draft represents an odd mix of standards from different bodies, partially-ratified but still-changing technical reports, and other documents as reference sources. If the reader is left to interpret the intersections among, and applicability of, these documents, the only certain results are inconsistency and confusion. In particular, the language and script registration rules of Section 3 rely on a mixture of an IETF Standard for language identification that was designed primarily for another purpose and may not be completely suitable for this one (RFC 3066), an ISO Standard for script identification (IS 15924) that has been controversial in some quarters and that does not contain guidelines for use in this type of context, and a Unicode character properties list (UTR 23) that evolves as characters are added to Unicode and needs change. No guidance is given about how those various standards can and should be used together. An IETF effort (the products of the LTRU working group) that might provide some assistance in the area is not referenced (although it is mentioned under "Additional remarks", see below). Whether it should be depended upon now is questionable since it is not clear at what granularity it should be applied to this work but, since it is intended to supersede RFC 3066, ignoring it entirely seems inappropriate. The text indirectly indicates an understanding of the issues by indicating that the various standards "illustrate" the relevant designation, but, again, that approach provides little real guidance. Worse, if the statements made in that paragraph are taken as rules or guidance, they would essentially prohibit the use of a few Indian languages, and a large number of African languages, in IDNs despite the fact that all of the required characters appear in the Unicode code tables. If there is a language or script for which Unicode encodings for some characters do not exist, it is probably appropriate and necessary, although painful, for ICANN to take the position that the encodings be registered first and then that IETF extend the mappings permitted by IDNA before IDN registrations are permitted: there does not appear to be any stable alternative. However, if Unicode codings are available for all of the characters relevant to a particular language and script, but no standardized names exist for those characters as a single script or unique to that language, it seems unreasonable to ban, or even significantly postpone, IDN registrations for that language. ------------------------------------------------ (6) The status of the "Additional remarks" is unclear. Are these remarks part of the Guidelines? Suggestions about areas for future revisions? An indication that the Guidelines are complete enough for community comment but not for any instantiation into policy? Neither the text nor the "Additional remarks" title provides any answers to those questions. The questions are important because several of the remarks that appear to be statements of fact are actually statements of highly controversial opinions. For example, UTR 36 remains controversial both within and outside the Unicode development community. Whether the status of ISO 639-3 (which is an improper way to refer to it, it is ISO/DIS 639-3) is "advanced stage" is in the mind of the beholder: the relevant technical committee identifies a "Publication target date" of "2006-12-03" and a "Status" of "Under Development". It is worth noting that the initial DIS version of ISO 10646, the ISO counterpart to the Unicode standard, was completely replaced by a new version and model at a later stage than 636-3 has now achieved. There are other examples, but these comments are already too long. If the "Additional remarks" section is intended to suggest that this draft of new Guidelines is unsuitable for incorporation into any policy process as it is now written, I would completely agree. If that section somehow is not expected to count, then it is even more clear to me that the draft Guidelines need an extensive reworking, starting from different principles about policies, relationships, and actions and then continuing on to address the technical issues in a way that provides more actual guidance.