Fighting spam on GFF
Here's a rundown of the different things I do to keep spammers from posting on the Global FlyFisher
Nerd alertThis is NOT about fly fishing or fly tying, but about site development and nerdy stuff
With 5,000 and sometimes 10,000 daily visitors on a well known site as GFF, it naturally becomes a target for spam. Add to that the Open Source platform the site built on, and you open for both human and automatic spam submitters.
Robots and humans
The fact that all URL's are known, most signup, comment and posting mechanisms are public and well documented and the system has been around for years makes it very easy to write (or find) scripts that can exploit possible weaknesses. It's almost impossible to guard a site completely against spam, and even the tightest Captcha's and the most convoluted signup and posting procedures seem to be breached.
Spammers simply unleash so called robots, which are computer programs that target known systems, and can register fake users and then add signatures, profiles and comments with links to all kinds of shady web sites. All this is done automatically and can be hard to stop, but not impossible.
A lot of spam is posted by humans, who sit in sweat shops in India, Pakistan or China and simply register as legit users, navigate through all spam countermeasures, confirm their account and start posting spam manually. This type of spam is almost impossible to stop because no system can see the difference between them and “real” users. The way to stop this is to delete posts and block the users.
A few ways
I have chosen a few ways of fighting spam:
- During signup and posting, new users are confronted with a very simple, home made Captcha. This filters out most automated scripts, but still isn't too much of a hassle for legitimate users. It can't keep out human spammers, but very few countermeasures can do that, not even the most intricate Captchas.
- Signup requires an email-confirmation, meaning that you have to click on a link in an email the system sends, adding an extra step to the process. It also means that the mail account has to exist and be able to receive mail. This can be done by robots, but requires some pretty advanced programming.
- If a link is posted or changed in a profile, the account is blocked until I have checked it and manually approved it. The approval is registered in a hidden field made for the purpose in the user profile. Only approved links are shown, even for unblocked users.
- I constantly monitor signatures, links and posts by all users, simply disabling the users who post spammy links or content and deleting spammy content.
- No links are allowed in anonymous comments. It's possible to comment without being registered, but if you do, your comment can't contain links. Since links is what spammers want to post, this helps keeping them away.
- I regularly run through all comments, signature, profile links and users to manually assess them and prune out those who are not legit.
I use a couple of tools for the job, both of them home made in the form of small modules that solve different issues.
Trick question is a small Captcha module, which I made. It simply asks a question of the user. It's easy to answer with mostly just a single letter and the correct reply is pretty obvious. The thing is that this simple gatekeeper is rarely used on any web sites, meaning that it doesn't appear in many of the robot scripts available. I could have used a more established Captcha system, but most are large and complex and much harder (and more deterring) for the users to answer.
Purge User is a module that only exists on GFF. It's a simple module that adds a Purge tab to the user's page, where I can quickly see what a user has posted and commented on, and have the user purged – meaning that all the content is deleted and the user is blocked. The reason for blocking and not deleting is that it disables the user account, but doesn't allow it to be created again. The module also offers a mass purge function, allowing me to purge many users at the time using UID's. I typically harvest these UID's from the database by doing different SQL queries.
The module also has a function to list all users who have a signature and all users who have a link in their profile. Next to each user name and signature/link there's a link to purge the user or approve the profile link.
So my daily routine is to simply look at the user list, which shows the most recent users first. I can usually spot new fake users because they have odd and often identically looking names and very convoluted emails, usually on Yahoo, Gmail, Russian Yandex, Chinese Baidu or a similar service. I simply use the purge function to block the really obvious spammers and clean out their posts.
I then look at recent content and comments to see if users have posted new stuff. The fake posts or spam posts will usually reek because they contain links to odd pages and often contain codes, which show that they have been aimed at general kinds of forums and systems, but they also typically contain totally nonsense text, revealing their nature. If I see fake posts, I use the purge tool, which finds all posts from the user in question. The really persistent spam bots, which can literally post hundreds of comments or forum posts.
Finally I fire up the Purge tool and can sweep all profiles. These can then be purged one at the time.
Once in a while I fire up a database tool and look at the relevant tables to see if I have missed anything. Depending on the number I then purge one at the time or use the mass purge function.
A fight against windmills
Fighting spam is a never ending battle. Spammers are extremely persistent and can muster a never ending arsenal of tools and methods, and new ones keep on coming, so fighting it is an arms race, where site owners also have to keep on packing out new tools and developing new methods.
GFF isn't exactly the most plagued site. We're pretty low profile and not that well known, so few spammers target us directly. We just get hit when a bot finds us or when we're on the canvas list in some sweat shop somewhere in India, Pakistan, China or some other third world country.
Most weeks we just trot along under the radar and see no spam posts at all, while other days call for an hour or so to clean up junk.