Using bogofilter on comments in django • Serge Émond

The Problem

Five minutes after I placed my new django-based website online, I got my first spam. Yesterday, I got as many comments (all spam) as I used to get in a month when this website was based on ExpressionEngine!

Of course EE’s captcha was enabled, so it helped, but spam still got through – a few every other day.

So, what can I do about it?

Conventional captchas are not so efficient (cracked, cheap human labor, …), and they are often as tough on humans as they are on scripts;
Akismet: I don’t want to depend on external entities unless I have exhausted all other possibilities;
The popular reCAPTCHA – apparently often cracked, and doesn’t fit my external policy;
“Are you human” thingies, like simple mathematical formulas – I strongly believe “1+4” to be easily parsable and calculable by any script, and I find most other solution I found too annoying to implement and/or use (like “drag the pencil image to some box on the right”).

The solution?

So I thought… what about bogofilter? It works very well sorting my emails… granted, I get hundreds of email (mostly junk) on a daily basis, while I only get a few comments (99.9% spam :). Also, blog/photo comments are way shorter than emails.

Maybe I can’t search, but I couldn’t find anything on the subject on google, so maybe it was a stupid idea.

But still… it might work, right?

So I took a few hours to make a django comment moderation system based on bogofilter_. Essentially it creates a short email-like message: “headers” with some tags based on various things like the IP, the time difference between the form generation and the comment posting (e.g. “ALMOST_INSTANT” if less than 5 seconds, …), and such.

Then I generated a wordlist.db by sending the comments to bogofilter via django’s admin interface.

And then… I placed it online, and less than 5 minutes later, my first spam was intercepted! And yet two others while writing this post. :)

Preliminary conclusion

Well… it worked for 3 spams out of three! Ok… that doesn’t actually mean anything.

I don’t actually expect it to let legitimate comments (if I ever get any! :) pass through automatically validated. Not until I have more anyway.

But… my main concern right now is to minimize promoting spammer websites and help rank them on search engines. So if 90% of the spam gets intercepted, and 50% of the legitimate comments are automatically placed online, I’ll be happy enough.

If it works well enough and manage to find some time to completely uncouple it from this website, I might eventually publish the code somewhere…