Comment spam

Back in May I mused idly about hair in a very brief blog post.  For months now I’ve been plagued with a torrent of comment spam on that particular post, and I’m now disabling comments on it altogether.

This is the most unsubtle form of spam, full of utterly blatant keywords and phrases like “nude teens”, “pre-teen sex”, “lolitas”, “hairy pussy”, “nymphet incest” linking to the spammer’s sites.  So surely it should be trivial for a spam filter like akismet to deal with them?

Akismet can tend to be over-zealous with legitimate comments, and regularly tends to caution when posts contain links.  For example, Andrew’s recent comment on my Mac troubles includes helpful links which caused Akismet to send it to me for moderation.  Most regular spam just gets automatically binned without my ever knowing about it unless I actively take the trouble to check.  So how the heck does this particular crap get past it?  If Akismet were human, I’d have to suppose (s)he was either being blackmailed or taking backhanders!

It’s not even as if links from here have obvious spam value: wordpress automatically inserts rel=nofollow to tell the ‘bots to ignore them.  And my blog is actively managed: I welcome comments but remove spam, including the traditional innocent-looking stuff that just says something bland like “nice blog”, or even spam compliments like a “thank you for saying that” where they wrap a link.  My criterion is not what someone links to, but whether the ‘comment’ contributes to discussion or is a ‘bot that’s just posting at random or at best has latched onto some key word or phrase in a post.

Talking of which, I wonder why that particular post attracted so much crap?  Is it perhaps the phrase “Long luxuriant hair” appearing in a legitimate comment?  Or maybe the title of the blog entry means something different in the spambot’s world?

Let’s see if this entry attracts similar crap.  If it does, I might (reluctantly) have to close comments here too.

Posted on September 24, 2011, in akismet, porn, spam, wordpress. Bookmark the permalink. 13 Comments.

  1. I think we need to publicise the ‘nofollow’ info more – although after reading so many poorly conceived articles on SEO I doubt these spammers really understand the concept of rel=nofollow or any other SEO principle.

    You could change the Aksimet settings – set the links to 1 or 0. I took a more extreme measure, and disabled guest commenting. Now very little spam at all. Perhaps we should encourage WordPress.com to include Captcha on the comment form. Anyone who has a worthwhile remark will take the time to use their Facebook or Twitter account if they are not a WordPress.com registered user.

    To chnage the subject slightly, I would really like to see a new category added to ‘Comments’ for Pingbacks – I would like to keep a record of these, and at present they are cluttering up the Trash folder.

    Regards – heres to less SPAM

    Mike

  2. Mike, I see you recently blogged about spam comments, and gave an explanation of how they work (and fail). If any of my readers want to learn more (like for example the significance of rel=nofollow I mention above), Mike’s article at http://graphiclineweb.wordpress.com/2011/09/14/comment_spam/ is a good intro.

    Not that I’d agree with some of your other articles. Especially the one on validation: noone forces your web docs to conform to any particular spec, but a document that fails validation is one that CLAIMS to conform to some standard but fails to do so. That’s just sloppy. If the electrician who wires my house provides power points that don’t conform to the standards my appliances require, I’ll be pissed off and he’ll be out of a job.

  3. Hi Niq

    Thanks for the referral to the spam article and more thanks for raising the validation topic – your remark inspired further elaboration and a referral to your article:

    My post ‘http://graphiclineweb.wordpress.com/2011/09/17/w3c_ralidation_relevant/’ is intended to highlight the fact that a page using these common features will not achive validation against existing standards (doctype html1 transitional etc). My intention is to generate interest, and perhaps raise a call for a new standard that takes these into account, to be created sooner rather than later. (How about doctype ‘html1 og transitional’ or something to that effect)

    Open Graph, HTML5 (and I see there is a ‘beta’ HTML 5 doctype validation protocol being developed) are going to become increasingly more common – Facebook has recently introduced HTML5 ‘like button’. Site builders and especially developers need a tool like those already available from a reputable orgnisation (W3C) to test and verify their work.

    Agreed nothing forces us to ensure our work conforms to any standard. However, I think it is fair to say having a recognised standard to work to is of benefit to everyone. It is great to be able to present a reliable report from a reputable external source to a customer, showing that the work is up to scratch, however there are very few clients these days requiring no more than an extremely basic site. As you say, a page either achieves validation fully, or not at all. The reference to the ‘electrician’ is a fair example; the description ‘sloppy’ applies to the Facebook open graph code they provide (the unclosed tags – />).

    A modification of of existing doctype standards to include the newer meta and other functions can assist all of us to ensure the use is within recommendations, just as the existing DTD’s set a target for quality and conformity in basic html code.

    Open graph, oauth, HTML5 and CSS3 are fairly new… possibly too new and in the process of evolving to set a fixed standard for their use at present. However they are here to stay and the sooner we have a recognised standard for implementation, the better.

    (And maybe someone will also create my other pet wish – a ‘Proper English’ spell-checker for WordPress.com) for posts and comments!

  4. I limit the number of links and that caused a complaint when I was asking for tips last week, so I’ve had a good look through recent comments and I think I’m not suffering from spam or false-positives too much. I use the FOSS blogspam.net instead of the opaque-core Akismet along with some other anti-spam modules, premoderation and fairly liberal blacklisting. I’ve also found that OpenID has helped reduce the number of spammers trying to imitate previous successful commenters.

    I’m disappointed that Mike suggests CAPTCHA above, because that has little to do with spam. CAPTCHAs do not test comments for whether they are spam. They are generally some sort of ability test and often physical ability tests that discriminate against disabled legitimate commenters while allowing able-boddied spammers. The Google reCaptcha that is often seen is particularly poor, with unavoidable text that suggests disabled users are not human. A CAPTCHA will filter out some script-sent spam, but leave you wide open to human spammers and if your blog is any good, it’ll have human spammers attacking it too.

    Mike’s main advice in his article seems to be to disable guest comments. I don’t like that because it has too many backdoors. I can’t comment on Mike’s site because I only have a Twitter login I could use and I’m not willing to give Mike’s blog permission to post tweets as me. So, it would be better to install better anti-spam plugins and premoderate. Please try sharpening your anti-spam rather than making your site that hosts some quite interesting discussions into yet another dreary monologue.

  5. Hi there

    I am commenting on 2nd comment – spam. I don’t have a website yet or a blog – something that will change soon. The web manager I selected has been giving me a lot of information and showing me various options, including ways to control spam.

    A captcha method seems very popular and there looks to be quite a few different ones. I have seen the recaptcha you mention on used on many websites; surely if it didn’t work it wouldn’t be used so often? I have also noticed some text about checking if you are a human used with recaptcha. My web man tells me I can have any text I like in that space. Is he wrong? So far he has gone out of his way to be informative, without a cent having been paid to him.

    I also noticed that very often these captcha things have an an audio option to tell the user what the text is. Does that not assist vision impaired people? I believe it is even possible to have a program to read the writing on webpages. I wear glasses, as my eyesight is not the best these days, but am fortunate in still being able to read with specs.

    I gather the comment above is about the article (post – is that the correct word?) referred to in the 2nd comment?. I can only agree with Mike – if he gets anywhere near as many spam remarks as I get spam e-mail, then any way to control spam is fine. I definitely can do without the time wasted every morning to sort out the rubbish mail from those few important to my business, – even with several spam filters on my mail, there are still enough that get through to be a real waste of time. And I can’t just empty outlook junk mail folder, as often a genuine mail ends up in there. Maybe one day in the distant future when I retire I will have time to sort out the spam from everything else. Right now spam wastes my time which costs me money.

    I fully agree anyone who has something worthwhile to say will take the time and trouble to log-in in to Facebook or Twitter – hang it – I am nearly always logged in anyway.

    Raymond from Mossel Baai

  6. OK, since this post is attracting discussion, perhaps I should expand on my previous comment and in particular my reference to Mike’s post.

    One reason I linked to it is that he does explain somewhat more than I do for non-techies. It’s a post worth reading if you find mine interesting and aren’t already knowledgeable on the subject.

    But perhaps the deeper reason is that I was mildly surprised Mike hadn’t linked it himself, given just how firmly on-topic it is to my post and his comment. Had I come across as a totalitarian who wouldn’t tolerate any link from my blog?

    On the subjects of controversy:
    1. I detest captchas (they sometimes beat me, and the audio alternatives where provided seem to be worse) and wouldn’t use one.
    2. I don’t want to ban guest comments, and with Akismet taking most of the strain I see no need to, except on the one particular post I referenced. But unlike a captcha, that’s a measure I would (with some regret) consider if I had a more serious spam problem. Though if I did, I’d find generic-OpenID a more palatable option than WordPress currently offers.
    3. On the subject of blogspam.net vs Akismet, that’s not an issue to consider so long as I take the line of least resistance and host my blog at wordpress.com.

  7. Just replying back to Raymond on the points niq didn’t cover: recaptcha has Google behind it, they’re apparently big enough to ignore equality laws and attract a heck of a lot of tech fans. An audio ability test isn’t much of an alternative because poor sight doesn’t mean you’ll have perfect hearing and the noisy hearing tests have a much higher failure rate (in the region of 40% I think?) than the visual ones.

    You shouldn’t really say that spam is so bad that anything goes: that’s disproportionate, harming innocent bystanders, like using a flamethrower to stop a pickpocket who is escaping through a crowd with your wallet.

    Also, you can’t customise away all the offensive text because some of it is served from the recaptcha.net servers like in this link. I’d go so far as to say if your web manager told you otherwise, it’s time to look for a new web manager before he messes you up!

    Also, my objection isn’t to logging in with Twitter (although that seems a bit like giving too much control to a few private providers so I wouldn’t choose to do it), but why should I give Mike’s blog permission to post tweets as me? That shouldn’t be necessary.

  8. Teehee. Too many “Also”s. I need to leave shorter comments🙂 Sorry

  9. Hi again Niq…

    Yep, we do seem to have a good discussion going here. OK – not too sure which article you were referring to that I didn’t link to… The tongue in cheek one regarding ‘validation’, and the one re spam comments are linked… Actually, I don’t like to ‘spam’ other blogs with my links – rather err on the side of caution…

    Just to mention something to Ray. Ray, Niq, MJ Ray and myself are all correct regarding captcha text – The part about ‘human’ that is so commonly seen on re-captcha’s is the default message text provided. Your web chap is perfectly correct in telling you this can be changed. But it can depend on the system you use to run the website.

    For example, I’m using Drupal CMS for my business site, and have several options in providing a catcha image – using either the Drupal captcha module, or an add on module which uses ReCaptcha… I enabled the re-captcha option to give an example – on the site contact form; http://www.graphicline.co.za/content/contact-us with a short statement “This question is to prevent automated spam submissions”.

    Self hosted WordPress.org sites have an extensive range of modules to choose from… The remark can also be set through php code… WordPress.com hosted blogs are much more restricted in what can be used.

    On the other hand, MJ (I think you referring to the actual re-captcha text, that is served by Google re-captcha service… the options are very limited (apart from the message).

    To go back to Drupal – the standard Drupal captcha module allows a large amount of flexibility – the number of letters can be set, also if they should be letters & numbers and so on. The way the image text is also customisable – noise (speckles) lines and distortion can be added, with the amount of these distortions being customisable.

    And Niq, I again agree that Captchas are annoying… but, and here’s the but, as Ray says, I also don’t have the time to trawl through mails and comments (or answer spam telephone calls from canvassing call-centres for that matter). I also had open commenting up to a few months ago, then started finding in excess of 70 pure spam comments with one or 2 in between- The kind with links to totally inappropriate sites and so on – so disabled the guest comment function. Ray, also living in SA, knows we need to work long hours to make a decent living from the very low industry standard rates most of us can charge to reamain competitive – but then we have many other compensations!

    And yes, the audio quality is very poor and highly prone to errors.

    The subject of accesibilty seems to be another topic being widely discussed, even with proposed legislation in some countries!!!

    My business site is a different case of course, where commenting is possible, the main reason is for registered users (customers) to provide feedback, and they don’t are exempted from Captcha… I only recently decided to allow controlled commenting with a Captcha to stop bot-spam… OpenID is a good way to provide a form of control… Pity Drupal version 7 Open ID for webforms and commenting is still in development – and it’s not important enough to for me to spend the time writing php code (hard code) into the site – In a few months the stable versions should be available. Drupal 6 however has this provision, as well as facebook and twitter login. (Note: customers can however use OpenID to login to their accounts.)

    MJRay is also correct – Captcha is not a spam filter – it’s a tool to limit automated bots…

    OK guys – I need to get back to work… Just a final comment to Ray – I see your are in Mossel Bay (Baai). I think I know who your web bloke is – there aren’t too many who put in a big affort to be informative to clients out here, espec. in Mossel Bay – If it’s who I am thinking of, he’s honest, and technically astute.

    Thanks Niq for putting up with this mini-forum! If you will forgive another South Africanism – salane kahle – (stay well all of you!)

  10. Sorry, but Mike and Ray’s web manager are both wrong. Users can’t change that “make sure you are a human” text yet. There is simply no option provided to do so. Here’s a screenshot of it appearing on the contact form that Mike linked to: http://i.imgur.com/drV4r.jpg

    I think it mostly appears if the website is untrusted (either directly or with add-ons like noscript.net), but with new attacks like BEAST starting to appear, more users may make their browsers less trusting.

    Recaptcha should be laughed out of town, but Googleblindness seems to descend. Use things like rate-limiters against bots – visual ability tests are rarely a good option and they’re not even captchas because they can’t Tell Computers and Humans Apart, which is the TCHA bit.

  11. I’m glad MJR mentioned the bit about browser settings… pity he didn’t mention java was probably turned off or include the part of the message after {recommend you enable} which would have been more informative and accurate

    We can turn off all the add-ons and display functions, or even go back to 80’s style pure text browsers and font type courier; Remove all but the occasional minimal size graphic image so users of dial-up connections don’t have to wait forever for a page to load. Exclude the features people seem to want. Very accessible, but extremely boring.

    We might as well throw out GUI operating systems, instead run basic 5 and CPM, or Linux without a GUI.

  12. Oh now Mike’s being silly. Switching javascript off for untrusted websites is a simple step and a prudent one because the majority of browser exploits use javascript as part of their attack vector. The US National Security Agency even recommend it (see page 7 of http://www.nsa.gov/ia/_files/factsheets/Best_Practices_Datasheets.pdf )

    I believe accessibility, security and simple human decency should be important to communications businesses. It’s not difficult to do. You just have to reject defective tools like reCaptcha.

  1. Pingback: W3C Validation – Is it still Relevant « Graphiclineweb

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: