<<<
Chronological Index
>>> <<<
Thread Index
>>>
Propose change to .Mail to make it more useful and feasible
- To: stld-rfp-mail@xxxxxxxxx
- Subject: Propose change to .Mail to make it more useful and feasible
- From: AccuSpam <support@xxxxxxxxxxxx>
- Date: Mon, 19 Apr 2004 03:38:07 +0800
>> Herbie Robinson wrote:
>> ...I don't think that the use of the .mail domain
>> should be limited to large bulk mailers. I think ISPs
>> should be encouraged to use .mail for their SMTP servers
>> (along with a policy for the occasional slip ups that will
>> occur). To this end, I think there should be fines for sending
>> spam from a .mail domain, an all-or-nothing take it back
>> approach doesn't seem workable. People do make mistakes...
>
>
> Shelby Moore of AccuSpam wrote:
> You just proved the logic in my previous posts in this forum.
>
> You agree there will be mistakes, yet for .Mail to be a
> 100% spam-free guarantee, then there can not be mistakes.
> If the mistake is a spam gets through, then that is a false
> negative. If a mistake is that sender does not use an approved
> and verified mail server, then email is deleted and that is a false
> positive.
I think there are changes to the .Mail which would eliminate the "all or
nothing" (100%) implementation and useage problems. This would enable rollout
to ISPs (on a separate base domain) who could offer it to their customers who
are more trusted (e.g. customers with long history and who have install
anti-virus software to stop viruses that send spam).
Let me first summarize in a brief list, the main problems I see with .Mail
proposal:
1. The sender authentication part will cause a false positives (delete
important email):
a) when the sender uses .Mail domain and does not use one of the
approved and verified mail servers in the central database for .Mail.
b) when the sender's domain is .Mail-enabled and sends without appending
".mail" (i.e. mistake)
2. The central database (the issuance and revocation) of .Mail causes false
positives and false negatives:
a) false negative when .Mail sender sends spam, this spam is approved
because .Mail database either does not think it is spam or because it is only
one of the occassional spams from that domain (i.e. a mistake, such as rogue
employee or ISP customer), but the recipient thinks it is spam.
b) false positive when the .Mail sender sends non-spam, this spam is
disapproved because the .Mail domain has been revoked by .Mail database because
they think the .Mail domain is sending (mostly) spam, but the recipient thinks
it is not spam.
However, I think the real genius behind the concept of .Mail is that for sender
to be both authenticated and invested (6 months waiting period, $2000, willing
to subject to centralized auditing of abuse reports, etc) to reducing spam
coming from it's domain, then that sender could be reasonably more trusted to
not be sending spam than a sender who has not invested in .Mail.
I think there is way to keep the valueable part of .Mail and get rid of most of
the false negatives and false positives. The essense of my proposed change is
to leave the decision of how much to trust .Mail to the recipient!
A) Eliminate the deletion of email at the MTA level. In addition to
eliminating silent false positives, this will also eliminate the huge cost on
the community that every MTA must be compliant.
B) Do not claim that .Mail is spam-free. Claim only that .Mail senders are
more likely to be spam-free. The decision as to how much more likely should be
left to the recipient. This will happen any way, so might as well codify it in
policy.
C) Let the final (human or computer) anti-spam decide how to weight the .Mail
in the overall decision of whether to classify the email as spam or not. In
addition to eliminating silent false positives, this also eliminates one of my
minor criticisms of the need to do the .Mail verification more than once per
email (not knowing if the upstream MTA did it).
Also by letting the anti-spam (computerized or mind) of individual decide how
much to weight the .Mail input, e.g. how SpamAssassin weights (intersects)
several spam metrics and uses a threshold decision, then .Mail does not suffer
the effects of being a bipolar ("all or nothing") phenomenon and becomes much
more useful and less harmful tool. Also due the probability theory of
intersection, anti-spam becomes more reliable, the more mutually exclusive
anti-spam metrics there are.
Note I think I read that SpamAssassin (or was it SpamCop?) is supporting the
.Mail proposal.
Here is the mathematic proof:
The math assumption, "P(a | b) = P(a) * P(b)" is derived, where "|" is
intersection of two mutually exclusive events. This shows that the false
negative rate for intersection increases but the false positive rate also
decreases. If we are not sure they are mutually exclusively and we assume they
are, then this is called "naive":
P(a | b) = P(b ! a) * P(a), where "!" is conditional probability, i.e. "if"
P(a I b) = P(b) * P(a), because P(b ! a) = P(b) if a and b are mutually
exclusive events.
Incidentally, the P(a & b) = P(a) + P(b) - P(a | b) where "&" is union of two
mutually exclusive events. The derivation is:
P(a & ~a) = 1 = P(a) + P(~a), where is "~" is complement
P(a & b) = P(a) + P(b | ~a)
P(b) = P(a | b) + P(b | ~a)
P(b | ~a) = P(b) - P(a | b)
So the probability of spam being caught by the intersection and union of two
filters (events), is P(a | b) and P(a & b), as shown above. However, if a spam
is caught by the intersection of two filters, then the probability (confidence)
that the caught spam is really spam increases, so although false negative rate
has increased (detection rate decreased), the confidence in the detection rate
has also increased:
P(a @ b) = P(a) * P(b) / [P(a) * P(b) + (1 - P(a)) * (1 - P(b))]
when the "a priori" probability of any email being spam is 0.5 and assume that
P(a @ b) = P(~a @ ~b), i.e. that probability that caught spam is really spam is
equal to probability that not caught spam is really not spam.
This is intuitively correct because P(0.5 @ 0.5) = 0.5. That is to say that
intersection of two filters which say catch 50% of spam catch less (25%) spam,
but the probability that the caught spam is really spam does not change,
because the filters had a equal probability of catching spam and not catching
spam.
The derivation is:
http://www.mathpages.com/home/kmath267.htm
Thus the confidence for the intersection of two spam filters which have 95%
detection rate (e.g. 5% false negative rate):
P(0.95 @ 0.95) = 0.95 * 0.95 / (0.95 * 0.95 + 0.05 * 0.05) = 0.997 = 99.7%
Shelby Moore
http://AccuSpam.com
<<<
Chronological Index
>>> <<<
Thread Index
>>>
|