ICANN ICANN Email List Archives

[gnso-ff-pdp-may08]


<<< Chronological Index >>>    <<< Thread Index >>>

Re: [gnso-ff-pdp-may08] Spam Filtering Tutorial - How to get high levels of accuracy

  • To: "Marc Perkel" <marc@xxxxxxxxxx>
  • Subject: Re: [gnso-ff-pdp-may08] Spam Filtering Tutorial - How to get high levels of accuracy
  • From: "George Kirikos" <fastflux@xxxxxxxx>
  • Date: Fri, 8 Aug 2008 12:15:47 -0400

Hello,

Good summary, Marc.

On Fri, Aug 8, 2008 at 10:31 AM, Marc Perkel <marc@xxxxxxxxxx> wrote:
> However - suppose you have a second rule that is also 1 in 1000 error rate.
> Then if they hit both rules you are up to 1 in one million. With a third
> rule it's 1 in a billion. (yes - this is a simplification)

It's a bit more nuanced then that, as you've assumed that the tests
are perfectly statistically independent of one another. In reality,
all the "rules" can be correlated, even negatively correlated in some
dimensions (i.e. they're not all orthogonal from one another).

Thus, at some point there becomes a limit to the accuracy. See the
contest Netflix is having over ratings predictions. There are limits
to their accuracy, and it won't ever be perfect.

But, as you said, for a certain subset (not the population in total).,
you can have close to perfect accuracy.

Sincerely,

George Kirikos
www.LEAP.com



<<< Chronological Index >>>    <<< Thread Index >>>

Privacy Policy | Terms of Service | Cookies Policy