ICANN ICANN Email List Archives

[comments-whois-misuse-27nov13]


<<< Chronological Index >>>    <<< Thread Index >>>

Comments regarding the WHOIS Misuse Study

  • To: <comments-whois-misuse-27nov13@xxxxxxxxx>
  • Subject: Comments regarding the WHOIS Misuse Study
  • From: "Greg Aaron" <greg@xxxxxxxxxxxxxx>
  • Date: Fri, 10 Jan 2014 11:43:43 -0500

I appreciate the opportunity to comment on the draft WHOIS Misuse Report
created by the Carnegie Mellon University's Cylab (CMU).

 

As background, I am an expert with extensive experience in both domain name
industry operations and in investigating the misuse of domain names.  As a
member of the Anti-Phishing Working Group I produce authoritative research
and metrics regarding the use of domain names for phishing, and over time I
have examined the WHOIS data of literally millions of domain names that were
being advertised via spam, used to distribute malware and phishing attacks,
used for cybersquatting, and for other misuses.  I am a member of ICANN's
SSAC, and served as the chair of the GNSO's Registration Abuse Policy
Working Group (RAPWG).  The following comments are submitted purely in my
personal capacity.

 

There should be no doubt that WHOIS data is leveraged by spammers and other
malefactors; that is a well-established phenomenon that the study confirms.
Unfortunately, the study does much less to tell us about the scope and
impact of the problem, and does not contribute much to the discussion of how
to balance the benefits and drawbacks of publicly available domain contact
data. 

 

The problems I see with the report are:  

 

1. The sample sizes for the registrant, expert, and law enforcement surveys
were very small -- the registrant survey yielded only 57 participants, with
25 describing some kind of misuse (page 32).  Only 18 expert respondents
were able to provide details, in relation to 23 individual incidents
involving suspected harvesting of WHOIS information (page 23).  These are
not significant samples upon which to base policy.

 

2. The registrant survey relied upon self-reporting and the perceptions of
those polled, rather than establishing whether the abuse was actually
attributable to the publication of contact data in WHOIS.  It is not unusual
for ordinary Internet users to misapprehend the root causes of their online
troubles.  For example, we do not know if some respondents released their
contact data via other means, such as in forum postings or to opt-in lists.
And it is not unusual for individuals to receive spam even though their
e-mail addresses have never been published in any Internet location,
including in WHOIS (see #5 below).  One cannot measure the extent and impact
of the harmful Internet activity experienced by Registrants that can be
exclusively attributed to WHOIS misuse without additional investigative
effort.

 

3. Separately, CMU registered 400 domains itself, and measured the amount of
abuse that resulted.  This method offers a more controlled and objective
opportunity for measurement.  Nevertheless, a big issue jumps out: the study
states that "80% of the spam emails collected during this experiment were
addressed to [just] the 25 domains of a single Registrar (Registrar 13)"
(page 57).  We do not know enough about this large anomaly.  An obvious
possibility is that Registrar 13's contract with registrants included terms
that allowed the registrar to share e-mail addresses with marketers or
related companies.  If this was the case, then those e-mails are not spam.
CMU recognized this possibility but did not investigate it, terming it "out
of scope" (page 66).  This is unfortunate because it could greatly affect
the statistics. 

 

4. The conclusion that "the more expensive the registered domain is, the
less email address misuse the Registrant experiences" (page 70) may or may
not be supported by the data.  Instead, it seems that frequency of abuse
might correlate more closely to choice of registrar.

 

5. It is not unusual for individuals to receive spam even though their
e-mail addresses have never been published in any Internet location,
including in WHOIS.  We see evidence of this in the report: 15% of the test
domains (and 27.5% of the .COM test domains) received email at completely
unpublished addresses, a total of 1,872 emails (page 57).  This may be
evidence that spammers were guessing email addresses.  Spammers use guessing
techniques to build lists of addresses, sometimes in conjunction with the
domain lists found in zone files.  First the spammers generate a large list
of email addresses using role names like "webmaster" and "sales", common
words, and firstname/lastname combinations, and then they send spam to all
of those addresses at a given domain name, on a trial-and-error basis.  The
e-mail addresses used in WHOIS for the 400 test domains were all
"contact@[domainname.tld]" (page 54), rather than being unique or more
obscure addresses.  I wonder if the "contact@" addresses were sometimes
guessed, rather than discovered through WHOIS data. 

 

Other observations:

 

Regarding impact and severity:  In some cases the research reveals lower
abuse than I would have expected, while leaving other important questions
unanswered.

a) Cylab received no malware attachments at the email addresses it published
in WHOIS (pages 61, 64).  

b) Cylab did not analyze the content of the e-mails it received, and
therefore we do not know if it received any phishing attempts or mails that
contained links to drive-by malware (page 64).  These days, linking out to
malware is much more common than sending malware as an attachment.  The
e-mails cannot be analyzed now, because too much time has passed since the
experiment ended in January 2013.

c) Forty-two percent of the voicemail messages seem to have been from just
two callers (page 62), and affected a very small number of the
"registrants."    

d) Cylab received in total 34 pieces of generic postal spam, only four of
which were classified as "targeted WHOIS spam."

 

The emails and voicemails received in the experiment illustrate some
phenomena that others researchers (including myself) have observed when
studying online abuse: 

a) abuse is not evenly distributed among the registrant population; 

b) abuse is sometimes localized around certain registrars, resellers, or
TLDs; and

c) a few perpetrators can be responsible for a significant percentage of the
abuse. 

As always, we should ask what actions would be appropriate for dealing with
these situations, such as whether counter-measures should be applied
across-the-board or in a targeted way. 

 

The evidence that WHOIS anti-harvesting techniques are statistically
significant in reducing the possibility of WHOIS email address misuse is
interesting.   There is clearly value in rate-limiting port 43 WHOIS access
to slow down spammers.  Existing large registries already practice
rate-limiting, and a big part of the problem may be registrars who have weak
(or no) anti-abuse measures on their port 43 servers.  That situation might
improve significantly when the .COM/.NET registry is taken to "thick" status
-- at that point registrars will no longer be a source of port 43 WHOIS data
for gTLD domains.   

 

Sincerely yours,

--Greg Aaron 

 

 



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy