comment spam resistance

Having installed the latest version of WordPress, I lost my anti-spam customisations and it started pouring back in. Around 10 an hour.

Two paradigms on the whole comment spam are classical AI and evolutionary dynamics.

From an AI point of view it boils down to a test which humans can pass but computers fail — a kind of turing test. This is the approach taken by the CAPTCHA project which tends to frame it as a classical difficult image recognition problem. Apparently these are starting to be cracked by smarter spambots though. And humans have to really squint for some of the trickier ones.

The problem really arises because certain pieces of software are very popular so it becomes worthwhile to target them for spam. Their relative uniformity means if you can spam one, you can spam thousands. Rather than making the tasks more taxing (yet still common) I favour breaking up the monoculture and allowing the user to specify their own test. This could even be a trivial textual question (‘How many toenails do 10 people have all together?‘). If you don’t get the answer right then your comment is rejected. If everyone makes up their own question then a successful spammer would need an AI that would make search engines drool.

I’ll be starting with the simplest possible test and increasing difficulty as required. Don’t think I’ll have to raise the bar too high.

5 thoughts on “comment spam resistance

  1. How about just renaming the form elements to be something else? For example (substitute square brackets for echelons):

    [input name=”melon”/] Name (required)[br/]
    [input name=”bison”/] Mail (will not be published) (required)[br/]
    [input name=”happy”/] Website
    [textarea name=”airvent”][/textarea]

    I don’t think spam bots will be able to interpret that, and it doesn’t require ANY authentication.

  2. I expect that would probably stop most current spam-bots. The danger is that if many blogs have the same structure then the form elements might still be reliably inferred. Even if they aren’t, spam-bots will probably just try random combinations because, hey, they don’t care! From their point of view it’s better to leave some kind of trace than none.

  3. I think I’ll try it anyway… DJBA is getting spam at the moment so I’ll update it and we’ll see what happens.

    Whoops! Almost forgot to tick the box! What kind of goody goody human am I?!

  4. Another point to note (quite obvious, this one):

    CAPTCHAs don’t affect trackback or pingback spam. In fact, in WordPress 1.2 (which I’m still using for DJBA and SquirrelWeb) I don’t think disabling the admin option for “Allow link notifications from other Weblogs (pingbacks and trackbacks.)” actually does anything.

    A quick remedy is to add a PHP die() to the beginning of the relevant files (wp-trackback.php and xmlrpc.php). However, this rather avoids the fact that trackbacks and pingbacks are cool. It’s a shame to disable them.

    Perhaps a way around this would be to add a “blog URL” field to the trackback request, which can be whitelisted or blacklisted accordingly. I wonder if WordPress 1.5 does this (I remember something about whitelisting)?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s