ICANN Email Archives: [gnso-ff-pdp-may08]

ICANN ICANN Email List Archives

[gnso-ff-pdp-may08]

<<< Chronological Index >>> <<< Thread Index >>>

[gnso-ff-pdp-may08] Spam Filtering Tutorial - How to get high levels of accuracy

To: "gnso-ff-pdp-May08@xxxxxxxxx" <gnso-ff-pdp-May08@xxxxxxxxx>
Subject: [gnso-ff-pdp-may08] Spam Filtering Tutorial - How to get high levels of accuracy
From: Marc Perkel <marc@xxxxxxxxxx>
Date: Fri, 08 Aug 2008 07:31:52 -0700

This is an educational message to give non-technical people some insightinto how spam filtering becomes highly accurate and to address usingsimilar technology for automated domain take down in a way that doesn'tscare registrars and free speech advocates.

Spam isn't usually detected by triggering a single rule. Unless the ruleis something that only spammers do, and there are some of them. So -let's say for example that we look at a rule that is accurate to 1 in1000. Is that good enough? No - it isn't. It means that if you process abillion messages you are going to make a million mistakes.

However - suppose you have a second rule that is also 1 in 1000 errorrate. Then if they hit both rules you are up to 1 in one million. With athird rule it's 1 in a billion. (yes - this is a simplification)

Then there are white rules. Indicators that the message is not spam.There are many instances where "spammer never do this" or "spammerscan't do this" that you can look at to take good email out of spamtesting and pass it. So once you apply your "this is probably spam"rules you then subtract out the "this isn't spam" rules any what youhave left is highly accurate.

It's like gambling in Vegas. In the long run the casino always wins ifyou play long enough. There is no jackpot so large that if do don't walkaway that the casinos won't win back if you keep gambling. Spamfiltering is like that. The more information you have and the more rulesyou apply the more accurate it becomes.

So - here's how it applies to fast flux. FF is a strong indicator ofphishing. Probably less that 1 on 100,000 fluxing domains is legitimate.But it is still very important to protect the free speech of the one in100,000. I, for example, would not want to be suspended just because Ireduced my TTLs because I was going to move servers to a new datacenter. And we wouldn't want to block people in Tibet from circumventingthe Chinese firewall. But - FF is still a strong indicator and would bea valuable piece of information as part of a bigger picture.

If the FF is spam bot driven then that too is a strong indicator. Andthe combination of spam bot driven and FF is a very strong indicator.But is it strong enough? If the from address is on a list of banks thatare often spoofed and the FCrDNS of the sending host is an IP address inChina, that makes a very strong case.

Then there are white rules. Rules that prevent take down of gooddomains. These rules are used to help protect from false positives. Ifthe domain in question is 10 years old then it would be blocked fromautomatic take down. We can come up with a lot of "white rules" fordomains that would never be available for automated take down no matterwhat the "black rules" were. So we would have a narrow set of domainsthat fall outside the white rules that are available for take down ifthere are enough black indicators to do so.

So - at this point it looks very safe - but would we catch anyone? Ithink so because criminals are limited in what they can do. And mostphishing activity might still be withing the black rules minus the whiterules range. So this would still be very effective.

Even with that outside the range (suspicious but not quite there) amessage can be sent to a real person alerting them to look into apossible abuse. Then they can be evaluated manually to determine if theyneed to be suspended or not.

And - you have to accept that you are never going to catch all of them.So if you can cut phishing by 10%, that's progress. If you then come upwith another rule that cuts 5% more - that's more progress.

Anyhow - hope this educates those of you who don't understand some ofthe technology and thinking about how this is done.

Follow-Ups:
- Re: [gnso-ff-pdp-may08] Spam Filtering Tutorial - How to get high levels of accuracy
  - From: George Kirikos

<<< Chronological Index >>> <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy