Re: [gnso-idn-wg] Re: Banning CCHH anywhere in a label
I believe the motivations behind banning strings with hyphens in the *third *and *fourth *positions are:
1. to protect registries who do not offer IDN registrations from unknowingly registering IDNs; and
2. to reserve future revisions to the IDNA standard where a different prefix might be assigned.
Ram Mohan wrote:
I'm not sure I follow this. CITIBANKchina.TLD would translate to xn--citibank-encodedchunk.TLD, so xn-- would not occur midway in the ACE string.
In general, the rationale for banning “CCHH” at a position other than the beginning of a string/label is unclear.I have not seen any documents that suggest banning CCHH at anything but the beginning of a string. Am I missing something?
All registrations shouldMany registries today use the ACE string at the registration protocol level, so your statement would essentially be advising against that practice. Personally, I don't think it is a problem unless the registry does NOT offer IDN and is accepting xn-- labels (in which case it probably simply treats the registration as ASCII and does not check for IDNA validity.) We may be in agreement here, but I wanted to further qualify your statement.
In table 4.4 of "Recommendation Tables for RN-WG Reports.doc":
For each IDN gTLD proposed, applicant must provide both the "ASCII compatible (ACE) form of an IDNA valid string" (“A-label”) and in local script form (Unicode) of the top level domain (“U-label”).I would also add that the applicant should provide additional strings that, after applying IDNA ToASCII operation, result in the A-label.
Additionally, there may also be complications where the U-label could be entered into an application using an input method editor ("keyboard") that may produce a sequence of Unicode characters that may not convert to the A-label (either becomes a different A-label or fails conversion.) This may be due to user perception that a character is what one thinks it is, but when entered using the local input software produces a different character due to locale differences. I will try to dig up some examples. This is not a technical / policy issue, but is a usability issue that affects the stability of IDNs.