Rent my time and knowledge so I can help with your project!

Spamassassin: how to increase spam detection rate up to 40% and avoid

I manage a medium-sized network in a university department, our current incoming mail flux is about 4K messages per day being almost half of this number unwanted spam messages. A few months ago during a server/os migration the entire mail system was rebuild including the spam filter solution (SpamAssassim/MailScanner).

After migration the number of spam increase a lot, and people start marching to my room with torches and pitchforks complaining about their inbox getting filled with penis enlargement proposals.

Alt text
My users

So here is a list of what I did (and avoid) to fix that.

Don’t even try those famous RBL direct on Postfix, I did, and during a few hours of test I detected a huge amount of spam, but surprising, big part of these spam are not really spam. And everyone knows that worse then receive spam, is got trusted messages marked as one.

Instead use the SpamAssassin RBL rules; the default score usually is low, so you have to re-score until SA become more efficient, something like this:

score RCVD_IN_SORBS_DUL 3
score RCVD_IN_SORBS_WEB 3
score RCVD_IN_SORBS_SMTP 3
score RCVD_IN_SORBS_HTTP 3
score RCVD_IN_XBL 3
score RCVD_IN_BRBL_LASTEXT 3
score URIBL_WS_SURBL 3

Don’t use SPF policy system direct on Postfix, unfortunately a lot of real servers still don’t use SPF records or use it in the wrong way, and you will end getting some real messages being blocked. Again, a better aproach is to use the SpamAssassin SPF rules, in ubuntu server this doesn’t come as default, you will need to install a perl module, and maybe do some other minor arrangements. Once you have the SPF subsystem working you will need to re-score the rules.

This work for my system:

score SPF_FAIL 8
score SPF_SOFTFAIL 6 *
score SPF_NEUTRAL 4

*My system flag as spam, messages that score higher then 6.5, so the SOFTFAIL will pass if no other positive score is assigned.

Use sa-learn to train SA bayesian filter, in my case I talked with some of my heavy user’s and now they move every undetected spam they receive to a folder named “.toFilter”, then every week a cronjob run sa-learn against these messages.

And at least – but not less important – learn to write your own SA rules, no need to go deep, but simple rules using the basic regexp can let you detect some specific messages that have as target your type of organization, in my case I created simple subject rules to flag a particular type of scam that occur in public universities.

That is it, the default SpamAssasin configuration is not a silver bullet, before this tunning process my system was detecting ~1k of spam per day, after, it got ~1.4k, and my users now can find other stuff to complain about.


by Ricardo Pascal on Sept. 25, 2011


comments powered by Disqus