Comment spam evolving?
Hard to say whether this is highly evolved comment spam, or someone jerking my chain, but someone just left a single comment on my old Multiping hack entry, with the almost-an-actual-comment text “Ping the site twice, that may make a difference.”, which doesn’t actually make any sense in context, but does sort of vaguely seem related, accompanied by an author URL that points to a $29 zipcode program and a link in the comment text to an offer to sell cover letter samples.
That’s only moderately evolved, but searching for the IP that left the comment in my access log shows (true or not) that he or she came in from a Google search for links to andersja’s blog. Then, while I was typing this entry, I got a TrackBack ping from andersja’s blog, pinging my comment spam alert entry because he had just been spammed on a couple of posts, with the same zipcode URL plus a couple of others. Charming: spam a blog, then search for new victims who link to that blog.
I, too, received two spam comments from this guy. But, like you said, they seemed vaguely relevant. It was only when he posted the same message in another thread (with the zip code URL) that I realized something was amiss and banned him.
I received enough ’unsoliticted advertizing comments’ with my old SprintPCS weblog posting that I pulled it.
They were using Google to get URLs based on SprintPCS, and then dumping in adverts to alternative companies.
Looking at the trackback message above makes me nervous. What better way to pick on someone / get revenge on someone / whatever than to forge your IP address to match theirs, then spam on some weblogs so they get banned from all of them.
IP blocking is definitely not the way to go. As I posted to my weblog just a few days ago, we need a method that can evolve itself to be able to calculate the probability that given their patterns in posting, and the content that they post, that they are a spammer.
I have to agree that blacklisting IPs doesn’t sound like a very workable solution: it’s too easy for them to get around, and too easy for someone to be falsely accused. There just isn’t anywhere near a one-to-one relation between spammers and IP addresses. Just because right now the spammers I know about have been using a single IP address doesn’t mean they have to keep being so easy to stop.
If I wanted to spam comments, I’d get a free account that supports PHP (oh wait, I’ve got several!), get changes.xml from weblogs.com once an hour to build up a list of target URLs, then GET them, use a simple regex to look for a link to movabletype.org (no sense making my life difficult by messing with too many comment systems), then use either RSS autodiscovery, a regex looking for a link to index.(xml|rdf), or just guess that index.xml will work, GET the RSS feed (using the referral URLs of popular aggregators, just to be annoying), and then I’ve got entry URLs. GET an entry, parse it for a form that looks like an MT comment form, save the
action=""
URL, and I’m ready to connect with a throwaway dialup account through a big ISP that can be counted on to use a proxy cache, and send a few thousand PUTs. I don’t care how slow the connection is, since I’m just sending a few hundred bytes and ignoring the response. About the best you can do to slow me down is to pass out tokens with a random expiration time, and then return a form requiring another submit if the token has expired. Getting that to work with MT would be a challenge, since so far it doesn’t require running the cronjob that would be required to manage and clean up after the tokens, and I should be able to mostly duck it by GETting tokens on one dialup account for five minutes or so, and then PUTting spam on another (behind the same proxy cache) while I GET more tokens.Someone blacklists my proxy cache’s IP (ouch – how many AOL customers sit behind a single cache?), fine, I’ve got Earthlink CDs too.
Sadly, much as I like your scheme, I’m going to be tough for you to stop, too: all I did that you know about is GET your main page and then your RSS feed from one IP, and then possibly days later I maybe GET an entry and then PUT a comment from another IP. You’re also going to have trouble fitting it into MT’s current requirements: despite the way people say it’s tough to install and requires a lot from your hosting, it’s nothing compared to setting up cronjob that will sit around constantly parsing an access log. It’s the most fun suggestion I’ve seen so far, but unless it’s easier to do than I think, it’ll be a geek’s Club: people who can get it working can shift comment spammers to those who can’t.
Unless of course it could be set up as a default part of an MT install or as a plug-in which anyone could easily grab, of course. If enough of the big systems went on to doing that then it wouldn’t be worth the spammer’s time to try to spam blogs when there are no results.
As far as the hard to track thing goes, the IP address wouldn’t be significant at all, except possibly as a temporary way to track the spammer. Even then though, as you pointed out, it’d be easy enough for them to fool even that. However if they’ve already spammed your site putting up x url and you’ve marked it as spam then your system should be smart enough to mark all subsequent requests to put up a comment which points to the same url as suspicious, then when you log on if there’s a legitimate request using that url you can let it through, and mark the rest as spam so your system deletes them. Also, I’d imagine that the spammers would want the routinue automated, that’d it’d take too much time to do it one by one. An automated routinue can only see so much variance in the postings though, so it shouldn’t be too hard for your program to pick up that this attempted posting is 90% similar to the spam one from yesterday AND points to the same URL, thefore mark as spam.
Of course I could be totally off the mark here. I guess my method of dealing with it doesn’t block the spammers from making the comments, but what it does do is make it practically wortless since they’re caught and have their comments removed almost immediately, and the chance that google is going to index that page before the comments are deleted is rather remote.
> and have their comments removed almost immediately
Perhaps it would be better to react with a appropriate delay. This way the spammer feels confident and if he proceeds in spamming he may lose a lot of work and particularly time. Let’s name it loss maximization :)
In the early days, when some of it looked hand-crafted, I might have agreed (though encouraging them to add more to othe people isn’t exactly neighborly), but all the comment spam I’ve seen for several months has looked completely automated, so I wouldn’t think that leaving it around would be any more effective than driving up email spammers bandwidth bills by viewing all your HTML email spam. The one thing they want is for Google to index their link, and since Freshbot can come by at any time I think it’s better to just get rid of it as soon as possible, so that they never see any benefit.
Amusingly enough, Mr. Zip Code is already persona non grata at Google, so the last time I wanted to see who all he had spammed, I had to search for link:losermoron.com at alltheweb. Surprising how many people will leave an obvious spam comment lying around.
> and since Freshbot can come by at any time
yes, but that’s not the point. the freshbot is worthless for this part. only the deepcrawler grabs the pages and links that will be processed into the pagerank+link-related part of googles algorithm. as deepcrawler doesn’t come daily it would be possible to wait until it’s first occurence in the logs or to detect it in realtime, and iff – ban the spam urls/comments.
http://www.zipcodedownload.com is still not penalized by google. also I am not sure whether the distribution of automated spam is really so dominant.
Pardon me while I squish my enormous ego down to size. I never really looked at what Freshbot was doing, beyond the one important thing: sticking my posts at the top of the serps. Never actually noticed that the things I was linking to weren’t getting any direct, immediate benefit. Bad me.
link:zipcodedownload.com certainly does show that there are a number of people who don’t bother deleting spam comments, doesn’t it? Hard to say whether the semi-sort-of customized comments mean he does a lot of hand commenting, or just that those are the ones most likely to stick. Still, I can’t believe that the ones that quote a line from a comment two or three back aren’t automated. If they are done by hand, that’s a pretty severe failure of imagination.
> Never actually noticed that the things I was linking to weren’t getting any direct, immediate benefit.
they get an indirect benefit, as your side climbs up the serps and people follow your links. that’s more than nothing. and with the next google update comes the link benefit. blogs like yours are a weapon from a search engines point of view ;-)
> that quote a line from a comment two or three back aren’t automated
perhaps the spammer got tired and unperceptive ;-)
> If they are done by hand
we should ask mr zip code in order to see more clear.
In the meantime I read a lot of the past stories, the automatic attacks, Sam Rubys solution as well as your suggestions, and yes, automatic comments seem to be a greater problem. But isn’t it stupidity from side of the spammers when they use scripts too excessive, as this way the normal blog-owners are forced to react – and thus will remove the spam. However, as in case of mr. zipcode, there are enough links left over.
Fresh spam (17. April 2003) at http://www.benhammersley.com/archives/004296.html. Josh writes:
”I have gotten several spam comments on my blog at http://www.ldsforums.com. I’m not sure how to stop it, I just delete the spam and start over at this point.”
Posted by Josh at April 17, 2003 07:50 PM
Nice, and he has a own blog. Perhaps you would like to make some spam, ahem comments there ;-)
MT 2.6 Feature suggestion: Collaborative spam-blocking
Any time a legitimate Movable Type user blocks an IP address for commenting, a ping cuold be sent to movabletype.com (read more…)
Voldemort Comment-Spamming?
Just received an email notifying me that someone had left comments on an entry I wrote last year (BIG NOTE: If you’re currently reading, or planning to read, the new Harry Potter book, then don’t click through, as it reveals who dies). Now,…