Friday, February 3, 2012 A constant battle with spam
Since starting my blog last year I've faced a constant battle with evil spambots adding content to the site in the form of blog comments. I've used a number of approaches to prevent spam with varying degrees of success. I've outlined my approaches to below, but it would be interesting to know how others deal with this problem.
Hidden Form Field
With this approach you add a field to your comment form with a name like 'message' and use CSS to hide it. When validating the form you detect whether the field has been populated and, if it has, prevent the comment from being posted. Genuine users (with CSS enabled) won't populate the field because they won't see it, but most spambots will. However, it's important to keep in mind that, although unlikely, some genuine users may have CSS disabled in the web browser so you should display a polite message if validation fails instructing them to leave the field blank. I've used this approach in the past with some success, but it didn't seem to work on my site.
Trusted Email Addresses
With this approach only users with a trusted email address can post comments. When someone posts a comment for the first time I receive an email asking me to review and approve the comment. If the comment is genuine I approve it and the commenter's email address gets added to a list of trusted email addresses. No further approval is required for future comments from that commenter. If the comment is spam it gets deleted from the database. This approach prevents spam from appearing on my site, but I tend to receive a lot of notification emails generated by spambots which is frustrating.
Block IP Address
With this approach you detect the IP address used by the spambot and block it. This successfully prevents spam from a specific IP address and it means I don't receive lots of notification emails triggered by spambots. However, there is always a danger that a genuine commenter with the same IP address as a spambot gets blocked. Also, spambots use multiple IP addresses so you could end up with a long list of blocked addresses. I would avoid this approach if I were you.
Detect Number of Links in Comments
Spambots like to include lots of links in their comments to websites selling viagra, etc. An approach suggested to me by John Whish is to detect the number of links in the comment and, if the number of links exceeds a predefined threshold, ask the commenter to review their comment. I haven't tried this yet, but I think it's a nice approach.
CAPTCHA Image
With this approach you display an image containing some text and ask the commenter to type what they see into a form field which is then validated. If the text entered matches the text in the image the comment is approved. This is a nice way of preventing spam, but captchas are often difficult to read which leads to a bad user experience.
Turing Test
A similar approach to a CAPTCHA, but you ask the commenter to provide a response to a simple equation (i.e. 1 + 1 = 2). From a usability point of view I think this approach is better than a CAPTCHA, but it's not ideal.
I think it's a matter of trial and error, but for me I think the best approach is likely to be a combination of the 'Trusted Email Addresses' and 'Detect Number of Links in Comments' approaches.
Tags
Share
Comments
Friday, February 3, 2012 Jules Gravinese
If you're using wordress, the built in akismet is terrific. Otherwise, I've had great success with this turing test: 'What color is also a fruit'? You can validate that with javascript easily. First you set the form's Action to a 404 or other domain like google. If/when the form is submitted and the answer is correct, you change the form's Action to the correct page and return (submit).
Friday, February 3, 2012 Steve Bryant
I created a CFC for spam prevention that uses content filtering to check for spam. It calculates points on a number of factors (that I update nearly every month - no need to update code to get updates), including things like number of links. It also looks for specific email addresses and URLs that are known to be heavy spammers.
I have been using this with good success for a few years now. My documentation needs work, but I can always improve that pretty easily.
One nice thing about this approach is that it works equally well against manually entered spam as it does against automated spam.
http://www.bryantwebconsulting.com/blog/index.cfm/SpamFilter
Monday, February 6, 2012 James Moberg
I also use Project HoneyPot's API to identify known bad visitors by IP. (Many times they are proxies, college or comprised desktops.) I allow these visitors to perform GET requests (search, navigation, etc), but no POST requests that actually submit form data that gets saved or emailed.
http://www.projecthoneypot.org/
I'm testing out SpamFilter now (after reading this post & reviewing the project's source.) Regarding "no need to update code to get updates", you still needs to manually toggle "getNewDefs" to occasional retrieve new updates to the list. (During the first run, only a single default word was loaded for some odd reason.) I added a datestamp to the spamWords database in case a client reports a new problem with messages so that I can quickly review new words that may be affecting it.
Tuesday, February 7, 2012 Simon Bingham
Thank you all for your comments and suggestions. Will give some of them a try... :)
Tuesday, February 7, 2012 John Sieber
I have used cfFormProtect with Akismet with good results. Some spam makes it through, but for the most part it is very effective. It also provides a much better end user experience when compared to using a captcha.
Wednesday, February 8, 2012 Andy Jarrett
a +1 for cfformprotect.
Thursday, February 9, 2012 David Boyer
I've been fighting spam on a few sites lately. One successful approach I used that doesn't seemed to be covered above, is changing the name and ID attributes to complete nonsense. For example, instead of having a field named "email" call it "04h12kjshgb3". You could have a variable containing references to each field via sensible names to make reading the code easier.
Thursday, February 9, 2012 Andy Jarrett
@david does that work? cfformprotect is doing a good job but I am fighting a lot of "Essay paper" spam at the moment and when I update the dictionary new terms start being used such as "academy paper". Its like there is more human involvement nowadays.
Thursday, February 9, 2012 David Boyer
Well it's worked for me on a forum I host. If the spam bot isn't clever enough, it won't be able to fill in a valid email address if the email field has a name/id of "oqjsd80h4". They'd have to go to extra lengths of working out which one is right from text content and label attributes / structure. :)
Friday, February 17, 2012 Julian Halliwell
I use Disqus for my blog comments. In the past year I've only had a couple of spam posts get through, both human, and one false positive that I'm aware of.
On other sites I've used a combination of some of the techniques suggested here, but I try to avoid Captcha, Turing or other such tests. Web forms are difficult enough to deal with for a lot of people as it is. I think we should try to solve this problem without making it even harder if we possibly can.