ICANN Email Archives: [whois-rt]

ICANN ICANN Email List Archives
[whois-rt]

<<< Chronological Index >>> <<< Thread Index >>>
WHOIS Review: "Cart before the horse?" and other ramblings (Part Deux)

To: whois-rt@xxxxxxxxx
Subject: WHOIS Review: "Cart before the horse?" and other ramblings (Part Deux)
From: "Ronald F. Guilmette" <rfg@xxxxxxxxxxxxxxxxx>
Date: Sun, 17 Apr 2011 23:48:48 -0700

My apologies for all of the glaring typos in my prior message, and especially
for the inadequacy and incompleteness of that final part.  I was worried that
the cutoff for submission of comments might be midnight April 17th, UTC,
but as this now appears not to be the case I would like to amend and extend
my earlier remarks.

Despite all of the typos, I hope that it was clear from my earlier message
that I strongly advocate the pervasive use of automated mechanisms in order
to pre-validate, at the very least, the e-mail address and phone number given
in each and every WHOIS record for each and every second-level gTLD domain
name registered, and also, wherever possible, likewise for domain name
registrations within any and all of the various ccTLD that are willing to
go along with this approach.

With respect to the registrant of any given second-level gTLD domain
name, the following four bits of identifying information are generally
available within the applicable WHOIS records:

    1)  Registrant name

    2)  Registrant snail-mail address

    3)  Registrant phone number

    4)  Registrant e-mail address

In lieu of certified embossed birth certificates and a small army of in-
spectors to examine them all, it would be absurd to assume that anything
can ever or will ever be done to try to validate or verify registrant names,
as they are given in WHOIS records.  It simply isn't practical to even try.
There is no way to automate any attempt to validate this information.

Although I see that at least one contributor to this discussion raised the
idea of the possible validation of registrant snail-mail addresses, that
also, from my perspective, is simply and obviously impractical.  Stamps
cost money.  Millions or tens of millions of stamps would cost LOTS of
money.  Then there is also the issue of the physical/mechanical handling
of all of those physical letters.  As, I am sure, any direct mail com-
pany would be only to happy to affirm, the physical handling costs would
also be distinctly non-zero.

So that leaves us with the registrant phone numbers and the registrant
e-mail addresses.  While some may yet disagree, I am of the opinion that
properly and affirmatively validating either or both of those bits of
WHOIS information would most certainly advance the interests of all five
of the groups of WHOIS data consumers that I identified in my prior
message.

As I mentioned toward the (hurried) end of my prior message, Sedo, at least,
has already developed what is clearly a perfectly workable and perfectly
serviceable system for validating the phone numbers contained within domain
name WHOIS records.  This is not merely a theoretical proof-of-concept
prototype on their part... Sedo has apparently been using this system *in
production*, to support their business goals for some time now.

As I described, Sedo's process for validating the accuracy of a phone number
contained within a WHOIS record (for a domain that is being registered for
Sedo's domain name "marketplace") is simply to inform the party attempting
this registration (via Sedo's interactive web-based registration process)
that he/she will shortly be receiving a phone call, and that the (automated)
voice that will be heard when the registering party picks up the call will
read off a five digit "magic cookie" type number, and that it will then
be incumbent upon the registering user to enter that magic number into
a specific web form at a specific pre-defined URL.

I, like perhaps millions of others, have now participated in this (Sedo
Marketplace) registration process, and as a result, I can now attest to
the fact that the whole thing is altogether simple, altogether obvious,
and last but not least, it works.  The moral of the story is equally
obvious:  Validating phone numbers, e.g. those contained within (or
intended for use within) domain name WHOIS records is most definitely
*not* a pie-in-the-sky fantasy.  Sedo has proven by example that validation
of WHOIS phone numbers is neither impossible nor even impractical (e.g.
on any technical grounds) and further, that it is not even impractical
_financially_, e.g. from the point of view of a business that routinely
deals with a very high volume of domain names.

All this gives rise to the obvious question:  If Sedo and others can
manage to perform automated foolproof validation of WHOIS telephone numbers,
e.g. when it is in their financial interest to do so, then what is stopping
ICANN and all of its financially-supporting constituent registrars from
doing likewise?  (I mean other than just sheer inertia and a penny-pinching
unwillingness to spend _any_ time, money, or effort at all unless and until
someone holds the proverbial gun to their collective heads.)

The bottom line is that ICANN, all by itself, or via delegation to its
supporting registrars, can and should implement _some_ cost-efficient
and entirely automated mechanism for validating WHOIS phone numbers.  The
cost of any such system would clearly be non-zero, but ICANN, together
with its registrars, are collectively in the enviable position of a de
facto monopoly, and as such can easily pass on any and all such costs to
domain registrants in the form of an additional penny or two per registered
domain.  (Even if, as a result of utterly inept engineering, the actual
costs per domain of such a system were higher than this, I do not believe
that that fact alone would eliminate the overall value of, and return on
such a system.  Actual costs cannot and should not be evaluated or judged
in a vacuum, but rather in relation to the clear benefits, to law enforce-
ment, to intellectual property owners, to the other constituencies, to most
consumers of WHOIS information, and to the general public confidence in the
management of the Internet.  These benefits would, I believe be manifold,
even if not always quantifiable.)

Having said all this, I do need to clarify a few additional things regard-
ing this general idea of validating WHOIS phone numbers.

Firstly, I want to be clear that what I am proposing is absolutely *not*
something that I think should be done after-the-fact (i.e. post-registration)
or merely for those domains for which formal or informal WHOIS data problem
reports have been filed.  Rather, to be clear, I am suggesting that the
automated validation of WHOIS phone numbers should be performed routinely,
and as an integral part of the domain name registration process itself,
and also, subsequently, each time any attempt is made to _revise_ the
WHOIS phone number associated with a previously registered domain name.
A new domain name registration should simply not be completed until such
time as the phone number that will be placed into the corresponding WHOIS
record has been successfully pre-validated.

Secondly, although I personally happen to be a citizen of the United States,
I am most certainly aware of the fact that there are many other parts of
the world where languages other than English are both pervasive and dominant,
and that additionally, in some of these places (e.g. Quebec) there exist
serious language-based cultural sensibilities that any truly international
organization such as ICANN cannot and should not ignore.  Thus, while I
would tend to agree with the other commenter that English should be accepted
as both the de facto and the de jure lingua franca of the _content_ of WHOIS
records, if there were to be implemented the kind of phone number validation
system I have suggested herein, then that could be and probably should be
implemented in a multi-lingual way, i.e. so that, at the very least, people
of various native tongues could navigate the automated validation process
without undue difficulty.

Thirdly, I also want to make clear that although Sedo's present phone number
validation system does have the advantage of having been proven viable, in
actual production use, I am not persuaded that it could not be improved,
and I can think of many variations on Sedo's general theme.  For example,
rather than the validator coming up with the (cryptographically random?)
magic cookie and then asking the validatee to regurgitate it, it seems
like it would work equally well if the validatee comes up with some sequence
of (random?) digits, enters those into a web site and is then expected to
regurgitate those back in response to an automated phone call and a vocal
prompt thereon.  I think that there are many plausible variations on all
this that would achieve the desired goal with reasonable security.  My point
here is not to detail all of the possible variations, but only to note that
many such exist from which to choose.  (These are minor engineering/
implementation choices that can be made later on, _after_ it has been
generally accepted that validating phone numbers is a Good Idea.)

Lastly, with 100% clairvoyance I can predict at least one of the obvious
objections to any validation scheme along these general lines... and I'd
like to preemptively counter it.

I am quite completely sure that there are likely _many_ parties who will
undergo much gnashing of teeth and tearing of hair, perhaps privately
but most assuredly publicly, over even the suggestion that _any_ form
of automated system along the lines detailed above should be applied to
validate phone numbers used in domain name WHOIS records.  They will do
so for the simple and obvious reason that the parties in question are in
the habit of... and in some cases make a living from... the _bulk_ regis-
tration of domain names, in some cases on behalf of other parties.

None of these parties is at all likely to want to have to pay a dedicated
human being to sit at a telephone, 24/7, just in order to complete a highly
repetitive and tedious telephone-based validation process.  Furthermore,
it is not at all clear that even a solution such as that would work, techni-
cally, for such parties.  Obviously, if you are going to press a button
and thus initiate a registration process for, say, 1,000 new domain names,
all essentially simultaneously, it is both technically and financially
impractical to have both 1,000 incoming phone lines and also 1,000 human
ears available to then respond to the subsequent flood of 1,000 simultane-
ous or nearly simultaneous validation phone calls that would thus be
triggered, e.g. if the kind of telephone number validation process I have
extolled above were implemented without any of the special refinements that
would quite clearly be needed to handle this rather special case, i.e. the
special case of registrants who deal in bulk.

Without going too deeply into the matter, let me just be clear that even
as I espouse a system which would make it difficult or impossible for
``average joe'' domain name registrants to place an invalid contact phone
into their domain WHOIS records... either deliberately or accidentally...
I would not even seriously propose any such system if it did not include
some specialized refinements which would allow for the specialized needs
of bulk domain registrants, provided that the goal of having valid con-
tact phone numbers present within all WHOIS records was preserved.  I do
believe that creative people could invent and implement a sort of hyper-
registration process specifically for parties intending to perform bulk
domain registrations, under which each such party's contact phone number
would be validated only once, and specifically _not_ once per new domain
registered.  (I will be more than happy to discuss the detailed specifics
of such a refinement with anybody  and everybody.  I have multiple ideas
for refinements that would cater specifically to bulk registrants, all of
which would work quite well, I believe, and would reduce the validation
time and effort for such parties to an absolute minimum.)  In short, the
issue of validating phone numbers for bulk domain registrations is not,
and need not be a show stopper for the entire idea of validating WHOIS
contact phone numbers.  Anyone who asserts otherwise is either technologi-
cally naive or is being deliberately disingenuous.

As in the case of phone numbers, a fully automated system for the validation
of contact e-mail addresses used within WHOIS records is well within the
realm of practicality.  Indeed it could be and should be argued that the
implementation of a system for validating WHOIS contact e-mail addresses
is even more within the realm of practicality than the implementation of
a similar system for validating phone numbers, and could quite clearly be
maintained and operated, over time, at an even lower cost.  As we all know,
the incremental cost of e-mail messages, once one has set up a server to
send them, is for all practical purposes zero.  Furthermore anyone who might
attempt to claim that the automated validation of e-mail addresses presents
either some unique or some novel technical challenges has obviously not
contributed to the present discussion.  I can readily infer that from the
fact that even my earlier e-mail message to the working group's comment
box was met with a reply e-mail message requiring me to validate myself.
Obviously, automated validation of e-mail address is not only alive and well,
it is widespread.

I've presented above what I believe are some of the arguments in favor of
the implementation of a pervasive, fully-automated system for validating
the correctness and accuracy of all contact phone numbers and all contact
e-mail addresses listed in the WHOIS records for (at least) all second-
level gTLD domain names.  I believe that it is imperative that ICANN adopt
some such measures immediately, if not sooner, for the simple reason that
I believe that ICANN will be, in future, and that it is in fact already
in material breach of its commitments under section 9.3.1 of its Affirma-
tion of Commitments to the U.S. Department of Commerce, specifically its
commitment to "... implement measures to maintain ...*accurate* and complete
WHOIS information..." (emphasis added).  If as I postulate, ICANN is already
in breach of its agreement with DoC, then it sems to me self-evident that
_something_ should be done, and done at the earliest possible moment, in
order to rectify this contractual breach.

It makes me sad to have to say this, but in all earnestness, as far as I
have been able to work it out, the _only_ measure of any real significance
that ICANN and its various subservient domain name registrars have imple-
mented, to date, which might even remotely or occasionally have the effect
of creating, maintaining or fostering of ``accurate and complete WHOIS
information'' consists of ICANN and the registrars simply making sure that
the check that is used to pay for some domain name registration successfully
clears the bank.

Please note that I have used the qualifier ``... of any real significance...''

As others have previously noted, the current WHOIS Data Problem Report pro-
cess, well-intentioned though its original design may have been, is, by its
very design, almost entirely ineffective and ineffectual when it comes to
insuring or even just fostering the accuracy of WHOIS information.  Even if
I were to choose one day and spend every waking minute... without let-up,
and without eating, sleeping, or taking any rest room breaks... using the
existing WHOIS Data Problem Report process to file reports on the tens of
thousands of domain names which I already know have utterly and intention-
ally bogus WHOIS information, I would be lucky to be able to file five
hundred such reports before passing out from the sheer crushing tedium of
it all.  Meanwhile, any random net-miscreant with a black heart, malevolent
intent, and a wallet full of cash could easily lap even such a dedicated
full-day effort many times over, and could do so with just a few keystrokes.
I have personally tracked multiple individual spamming operations that have
registered several _thousand_ domains on various days.

To say that the current playing field is un-level, and that it is tilted
heavily in favor of the Bad Guys would be an extreme understatement.

Being myself acutely aware of the self-evident bias of ``the system'' in
favor of all manner of net-miscreants (not exclusively spammers), I cannot
help but ask myself, in quiet moments, if this is in fact the way that ICANN
and its financially supporting registrars _want_ things to be.  I do not as
yet have any definitive answer to that question, but I hope and anticipate
that the final work product(s) of the WHOIS Review Team may provide one.
Certainly, and as I have intimated and/or alluded to previously, there would
appear to be financial incentives for both ICANN and its community of regis-
trars to NOT desire to look too awfully closely at whatever the paying cus-
tomers want to put into their WHOIS records.  I continue to hope however
that these incentives will, in the end, be outweighed by other considerations,
not least of which being ICANN's written commitments to the U.S. Department
of Commerce.

Before I close I need to also mention a few other things.

As I laid out in my earlier message, I perceive there as existing at least
five separate ``WHOIS consumer'' constituencies.  I myself fall most accu-
rately into the fifth and final constituency, composed of anti-network-abuse
researchers.

Although nobody has nominated me to speak on behalf of this constituency
(and I'm quite sure that many would be loath to do so) I do tend to hang
with a number of other anti-network-abuse researchers, and thus may, with
humility, lay claim to knowing at least a little something about our collec-
tive needs and concerns.

Speaking on behalf of anti-network-abuse researchers... even though nobody
has authorized me to do so... I would just like to say that a majority of
us have long ago given up on WHOIS data as a first or primary basis for
determining where truth lies in any given situation.  There is just far too
much baloney.  Although I myself still do consult WHOIS data from time to
time, its self-evident inaccuracies and well-known utter lack of anything
resembling validation have in recent years demoted WHOIS data in my own
work to only, at best, a third or forth line of supporting evidence.  That
is to say that in the search for the identity of the true registrant of any
given domain name, these days WHOIS data can only provide a third or forth
bit of corroborating evidence, after other indicators of true identity have
already been collected and factored in to the analysis.

Despite this sorry state of affairs, I still do look at domain name WHOIS 
data, and often... not so much for what the content of any given single WHOIS
record, taken alone tells me, but rather for what its similarities to other
WHOIS records can hint at.  For example, if I am looking at a pair of
suspect domain names, say "rubberduck737.biz" and "rubbercluck757.biz",
and if I find that these were both created on the exact same day, and if
the contact e-mail address in the WHOIS record for the first is, for example,
"sammy123@xxxxxxxxx" while the contact e-mail address in the WHOIS record
for the second domain is "sammy456@xxxxxxxxx", well, ya know, given a set
of facts like that I feel that I am on solid ground when I say that both
domains were registered by the same fellow.  I may not know with any degree
of certainty his name.  I may not know his address or phone number... because
what he put into his WHOIS records is utterly un-verified.  But I can at
least assert with some confidence that both domains were created by the same
guy.  This may not, at first blush, appear to be a very useful derived con-
culsion, but you would be surprised at just how valuable such conclusions
can actually be, in practice, and thus, you would also be surprised at how
often seeming gibberish unvalidated WHOIS information, while not leading
directly to an identity, can nonetheless lead to useful correlations and
occasionally even ones with some predictive power (e.g. allowing one to
say in advance which domain names will soon be sources of net-misfeasance).

The reason I bring all this up is simply to try to put forward yet another
brief proposal or two that I feel could have benefit, specifically for the
group of WHOIS data consumers that I have previously labeled number 5.
For some of us, at least, patterns, even in the absence of what might be
called ``identifying information'' can be useful, and can and should be
provided, IN ADDITION TO all of the traditional sorts of WHOIS information.

To be more specific, there are two bits of information that I personally
would find terrifically useful as I attempt to ferret out the various
domains that have been registered by specific individual spammers and other
specific individual net-miscreants.  The first of these is registration
date/time.  The second is a totally new (and ``synthetic'') piece of infor-
mation which, for lack of a better term, I'll just call "payer hash value".

Registration date/time is simple to understand, and a lot harder to actually
get hold of.  With the exception of the .COM and .NET gTLDs, registration
date/time is only available in conjunction with a "full" domain WHOIS query,
i.e. one including all of the (potentially spammable) contact information
for the domain.  Understandably (but unforgivably) registries that themselves
operate unified/singular (thick?) WHOIS servers are reticent to allow unfet-
tered access to these servers, and most or all of them do rate-limiting, which
interferes seriously with legitimate and benign research, EVEN WHEN the only
bit of information the researcher really wants or needs is the date/time of
domain registration.  This is highly unfortunate, and I would like to take
this opportunity to propose that _all_ registries be required, by ICANN,
to make available at least some WHOIS server or service that (a) would be
open to all for unfettered and unlimited access ad that (b) would provide
all and only the same data bits that are currently provided by the thin
top-layer WHOIS server for the .COM and .NET domains, in particular (a)
registration date/time and also (b) current name server names.  (Obviously,
these data bits are not in any sense ``spammable'' so unlimited access
does not present any meaningful issues necessitating rate-limiting.)

One other small thing before I leave the topic of registration date/time.

In practice, registration date/time is often all but meaningless due to
that fact that it does not, apparently, change when a domain name is sold
to some new and different party.  I guess that _somebody_ out there must
find ``original registration date/time'' to be in some sense a useful and/or
meaningful bit of information.  Personally, I don't, and I have no use for
it other than as a not-very-reliable proxy for the data I really want, which
is to say date/time of most recent registration (payment).  My hope/request
would be that the WHOIS Review Team give some serious consideration to (a)
formally defining the intended semantics of registration date/time and also
(b) requiring the maintenance of a new data field for all WHOIS records,
i.e. date/time of more recent registration (payment).

Finally, as noted above, researchers such as myself can often learn a lot
from correlations and patterns.  Some such are, on occasion, even derivable
from otherwise utterly bogus and intentionally deceptive and misleading
WHOIS information. In addition to most-recent-registration date/time, which
I have covered above, it is my considered opinion that anti-spam, anti-
malware, and anti-crime research would be greatly benefited by the intro-
duction of yet one more ``standard'' WHOIS data field, specifically an
irreversible triple-DES hash of what might be called the `payer ID'', that
is to say some (relatively) unique identifier string which in practice
may often correlate, more or less, to the unique identity of whoever has
actually paid for the registration of any given domain name.  The idea
here is not that the identity of the actual payer would be discoverable
via anything short of formal legal processes, but rather only that it
should, in practice, often be possible to say that domain `A' and domain
`B' were both paid for by a single individual or legal entity, even in
the absence of any knowledge of who that individual or legal entity might
actually be.

In the simplest case, and for what I suspect is the vast majority of all
gTLD (and even ccTLD) domain registrations, the value of this field would
be computed trivially as the triple-DES of the payer's credit card number.
(This process is fundamentally irreversible but will yield the same ultimate
hash value for each use of the same credit card number to pay for various
domains.)

Obviously some other unique payer identifier would need to be employed
in those cases where payment for domain registrations occurs without the
benefit of credit cards.  I can think of several workable and plausible
schemes via which such second-line identifiers could be either obtained
(e.g. EINs for businesses, Social Security numbers for individuals) or
generated and issued (e.g. by ICANN itself, on an as needed basis), but
these are merely small implementation details that could be worked out
when and if the radical notion of introducing a new and unprecedented
WHOIS data field, for the benefit of law enforcement and other researchers
becomes generally accepted as a Good Thing.  I believe that it could be
proven to be so, and would be well worth the effort.


Via con dios,
rfg
<<< Chronological Index >>> <<< Thread Index >>>
Privacy Policy | Terms of Service | Cookies Policy