Re: Comments on Guideline 4 in the v2.0 draft
Cary Karp wrote:
The extent to which one language can comfortably be represented using a single punctuation mark says absolutely nothing about the number of such characters needed for the comparably adequate representation of another. One of the most important requirements placed on the guidelines is keeping them reasonably immune to accusations of reflecting cultural bias (a goal that other comments suggest they have yet to attain).
"No-one gets any punctuation" seems to me to be fairly free of cultural bias, given that the Unicode Consortium has already kindly decided for us what it punctuation and what isn't.
To be sure, "the real necessity of any punctuation needs extremely careful examination". Equally certain is that some ASCII punctuation marks are not available for inclusion in domain names for absolutely compelling technical reasons. Such restricted protocol elements are represented using what also happen to be English punctuation marks.
Indeed. One could say that English and related languages are those _most_ disadvantaged by rules restricting characters which match protocol elements. Accusations of cultural bias in this case seem to me to be particularly unfair.
Should languages using scripts that include letters that resemble ASCII punctuation marks be restricted to subsets of their own alphabets?
No. But (and here I begin to sound like a broken record) labels should not be allowed to contain both, as ICANN has recognised. And TLDs which permit registrations using such character sets need homograph avoidance policies to make sure two domains which are homographic do not get registered to two different entities.
In fact, the reference to the Ethiopic wordspace was preceded by careful consideration of factors such as these. Although it does resemble a colon, the two belong to separate scripts that are about as graphically distinct from each other as can be (http://www.unicode.org/charts/PDF/U1200.pdf). Although the hyphen in English and the wordspace in Ethiopian can serve the same function in their respective languages, the hyphen is meaningless in Ethiopian. Making it available would also require breaking the 'one label - one script' restriction.
I had rather assumed that hyphen would not be included in that; clearly I assumed incorrectly.
Telling Ethiopian name holders that they cannot use their wordspace because it might cause confusion in a context in which it can never appear, is a poor way to demonstrate concern for linguistic equality.
1) http://www.example.com:/some-long-string-of-characters.tld/more/url 2) http://www.example.com:some-long-string-of-characters.tld/more/url
Say that the second "colon" in example 2 is an Ethiopic wordspace. Yes, there's also the difference of a slash, but I do not want to start down the path of "which protocol elements is it more safe to allow homographs of than others". That's clearly a nasty, slippery slope.
Hang on, you may say, that's mixed script. Yes it is. So let's have an example from the future, where X is a character in Ethiopic. This assumes that IDN TLDs now exist, and XXX is an IDN TLD in Ethiopic.
1) http://www.XXXXXXX.XXX:/XXXX-XXXX-XXXXX-XX-XXXXXXXXXXX.XXX/more/url 2) http://www.XXXXXXX.XXX:XXXX-XXXX-XXXXX-XX-XXXXXXXXXXX.XXX/more/url
Would we then try and retroactively withdraw the use of ETHIOPIC WORDSPACE because of the greater possibility of confusion now presented?
Seen from the opposite persepctive, if our interest is in preventing user confusion, we need to consider the need for restricting the range of punctuation marks available within the Ethiopic script. All we've been asked for so far is the one -- corresponding in every way to the H in LDH, but in accordance with the needs of one of the many communities to which Latin script is foreign.
It seems there are two ways we can approach this fairly and without cultural bias.
1) No-one gets any punctuation.
2) Everyone gets all the punctuation they like which isn't homographic with protocol characters.
In both cases, we let the chips fall where they may.