Comments on Guideline 4 in the v2.0 draft
Point 4 in the draft guidelines was the focus of considerable discussion by the members of the guideline task force, who also expected it to generate significant public commentary. Useful remarks have already appeared on the present forum and in communications via other channels. Additional commentary is likely to posted here as the deadline approaches, and people who are preparing to do so may find it worth noting a few background details about some of the wording in the draft. (This is a personal contribution to the discussion, expressing opinions that may be solely my own, and is not being made on behalf of the task force.)
The passage on which I am commenting is:
"Permissible code points will not include ... punctuation characters that lack grammatical significance in the language with which the IDN registration is associated (with necessary punctuation including characters such as the ETHIOPIC WORDSPACE in Amharic and the MIDDLE DOT in Catalan)..."
This wording was arrived at in real-time dialog with participants in the meeting on 'Unicode and IDN in Africa'
(http://www.next.sn/unicode-idn-africa.html), which took place in Dakar at the same time the guidelines task force was meeting. I was on the one end of an IM link between the two venues (with Michael Everson on the other) but cannot describe the discussion on the African side. Suffice it to say that my immediate correspondent made it clear that the credibility of the guidelines in African contexts could prove highly dependent on explicit reference to the ETHIOPIC WORDSPACE in precisely the form that subsequently appeared in the draft guidelines. The further reference to the MIDDLE DOT was considered in the same dialog, prompted by the IDN requirements of the new .cat TLD, and similarly put forward as an absolute necessity.
As anticipated, concern was expressed about this on the public forum. Quoting Gervase Markham:
ETHIOPIC WORDSPACE (U+1361) is classified here as 'necessary punctuation'. This character is a homograph of : (colon), which is a 'character with a well-established function as a protocol element' - a set later prohibited in the same guideline, for very good reasons. Without knowing the exact use of the character in Ethiopic, I don't want to be dogmatic about this, I'm concerned about it being classified explicitly by the guidelines as 'necessary punctuation', and even about the existence of such a class as 'necessary punctuation'. ASCII domain names have coped for many years using only hyphen (and CamelCase, presentationally) as a separator. I think the real necessity of any punctuation, particularly that which spoofs protocol elements, needs extremely careful examination."
The terminology used in the guidelines clearly needs further clarification and some concepts may be relabeled entirely. The term 'grammatical significance' will likely not be retained, but notions of necessary punctuation -- however they may ultimately be worded -- figure prominently in IDN policy.
The extent to which one language can comfortably be represented using a single punctuation mark says absolutely nothing about the number of such characters needed for the comparably adequate representation of another. One of the most important requirements placed on the guidelines is keeping them reasonably immune to accusations of reflecting cultural bias (a goal that other comments suggest they have yet to attain). Statements to the effect of, 'what is sufficient for anglophone requirements is sufficient for all other languages', are precisely what they must not make.
To be sure, "the real necessity of any punctuation needs extremely careful examination". Equally certain is that some ASCII punctuation marks are not available for inclusion in domain names for absolutely compelling technical reasons. Such restricted protocol elements are represented using what also happen to be English punctuation marks. This has the obvious further effect of limiting the kinds of punctuation that can appear in domain names. Although the selection of these elements was initially devoid of cultural intent, as with many other aspect of the present matter, conditions have changed considerably in the interim.
Does the unavailability of an apostrophe to indicate English contraction mean that contraction should be prohibited in other languages where it is indicated using different characters that are unique to that purpose? Does the restriction of the range of available punctuation to a single 'separator' in the Latin-based LDH repertoire mean that a similar constraint should be applied to every other script? Should languages using scripts that include letters that resemble ASCII punctuation marks be restricted to subsets of their own alphabets?
In fact, the reference to the Ethiopic wordspace was preceded by careful consideration of factors such as these. Although it does resemble a colon, the two belong to separate scripts that are about as graphically distinct from each other as can be (http://www.unicode.org/charts/PDF/U1200.pdf). Although the hyphen in English and the wordspace in Ethiopian can serve the same function in their respective languages, the hyphen is meaningless in Ethiopian. Making it available would also require breaking the 'one label - one script' restriction. By virtue of the same restriction, the wordspace cannot appear in a Latin string, or anywhere else other than in a sequence of Ethiopic characters. (Nor can a colon appear in an Ethiopic string.)
Telling Ethiopian name holders that they cannot use their wordspace because it might cause confusion in a context in which it can never appear, is a poor way to demonstrate concern for linguistic equality. Seen from the opposite persepctive, if our interest is in preventing user confusion, we need to consider the need for restricting the range of punctuation marks available within the Ethiopic script. All we've been asked for so far is the one -- corresponding in every way to the H in LDH, but in accordance with the needs of one of the many communities to which Latin script is foreign.
Cary Karp dotMuseum