Comment spam alert

If it’s possible to access your comments by just incrementing a number, and possible to leave a comment by just sending an HTTP POST request (yes, Movable Type users, that’s you, among others), you should know that you are vulnerable to being flooded with comments. All it takes is a moron spammer like “Mike Johnson” of “www.las-vegas-real-estate-1.com” with enough scripting ability (very little) and gall (quite a bit) to write a script that starts with your first entry number and runs through to the last, sending POST after POST with his moron spammer website link and witless comment text.

I happened to be sitting at the computer when he started, so I managed to put “Deny from 208.57.58.205” (the moron spammer IP he was using at the time) in my root .htaccess file in time to stop him at only 120 comments. Then, thanks to Movable Type’s support for mySQL, I just needed to “DELETE FROM mt-comments WHERE comment-ip = ‘208.57.58.205’ AND comment_entry_id != 2000” (I left the one on entry 2000 just in case I want to see it again, but I edited it to not even give the moron spammer the victory of a single link). However, had I not been around, and not using mySQL, I’d say that deleting a comment from every single entry, one by one, through the MT interface, would have left me even more pissed off.

The odds of him using the same moron spammer IP address aren’t great, but even so you might want to consider adding your own “Deny from…”, possibly blocking all of 208.57 (unless you happen to know that you have a reader getting access from mpowercom.net).

62 Comments

Comment by Phil Ringnalda #
2002-10-27 23:12:25

Further: the moron spammer’s technique is apparently to search Google for ”blogger post comment” (though I’d assume he uses other searches too), get the pattern for the URL that way (starting from /archives/002178.php), then start GETting from 000001.php, then 000011.php, then 000111.php, then 001111.php, then 001811.php, then 002811.php, then 002511.php, then finally hitting paydirt with 002108.php (in maybe 45 seconds from the first search referral, so I’d assume that’s automated, though I don’t see the pattern from 2511 to 2108). From there it looks like GETting internal links (nice of me to provide previous and next links, eh?) for a few minutes, a twenty minute layoff, and then POSTing to entries 2000 through 2005, followed by 115 posts to mt-comments.cgi. So if you are lucky enough to be publishing .php (or some other script language) and to not have entries 1, 11, and 111, you could defeat that algorithm by just logging IPs that GET 1 and 11, and then blocking them if they access 111. Doubt that it will be that easy longterm, though: I think we need comment scripts that stop flooding based on IP/name/email/url/text similarity, though even that’s pretty easy to get around: a moron spammer can afford to fake his IP, since he doesn’t actually care to get a response to his POST. For all I know, the IP that I blocked to stop my moron spammer wasn’t actually his, just the fake he happened to be using.

 
Comment by Simon Willison #
2002-10-27 23:39:04

I can see this becoming a pretty serious problem. There are already spamming companies out there that offer promotion on ”over 120,000 web based bulletin boards” (exploiting the lack of authentication in simple web based forum software such as WWWBoard) and with Moveable Type becoming so wide spread I suppose it was only a matter of time before they moved on to blogs. The spam algorithms are likely to develop pretty fast (spamming POSTs isn’t rocket science) so any flood detection code will have to be pretty ruthless to be effective. I just hope this doesn’t eventually lead to bloggers demanding registration before people can post comments.

 
Comment by Mena #
2002-10-28 00:00:42

Yes, this is a nasty practice and we’re going to be thinking of ways to thwart the spammers.

I agree with Simon w/r/t registration. Although we do have plans to include registration in future versions of MT, I’m not particularly fond of that solution. True, registration may block many unwanted characters from posting to your site but it’s at a cost — namely, the limitation of open discussion.

 
Comment by Tomas #
2002-10-28 01:10:25

Damn that’s low. Thanks for the warning, Phil.

 
Comment by Michael Bernstein #
2002-10-28 02:34:38

As spammers go, this guy seems fairly inexperienced. I readily got a lot of relevant contact information from the website and from recursive WHOIS queries.

I won’t bore you with the details, but I’ve forwarded it all to the local LUG here in Las Vegas.

 
Comment by jh #
2002-10-28 03:40:44

Kind of restores your faith in humanity. Thanks indeed for the warning.

 
Comment by Burningbird #
2002-10-28 07:34:01

Yes, thanks for the warning Phil. But I agree with Mena that registration isn’t the solution. I have not commented numerous times at weblogs where I had to ’register’ to comment.

We can test REQUEST_METHOD and look for POST, and then check HTTP_REFERER and only allow POSTs from pages that share a common root URL In my case, all pages are served from http://weblog.burningbird.net. I’m going to modify mt-comments.cgi to exclude POST requests that don’t originate from this root URL. Then I’ll test it to see what it breaks…

 
Comment by michel v #
2002-10-28 08:37:01

Same problem with TrackBack, by the way :P

 
Comment by Jeremy Bowers #
2002-10-28 12:51:49

We can [tech] the [tech] until the [tech] [tech]’s…

Burningbird, I wouldn’t spend too much time on anything like that. The fact is that if people can post without registration, a spammer can automate it, and there simply isn’t anything that will allow you to tell a ”real” comment from a spammed one with confidence. The email war is several years more advanced and the anti-spammers are not currently ”winning”. I see no reason to believe this will turn out any differently; it’s the same trust and content problem, with the same solutions, with ultimately the same, or even easier, workarounds available to the spammers, and the same ultimate foundering on the inability of computers to understand the content of a given text string.

The only real, long-term solution is pro-active monitoring of the comments, and deleting inappropriate ones. It’s not censorship, because comments on your weblog are on your space… quite literally! You are paying for the bandwidth to serve the spam out, so its well within your domain to just delete them.

Either that or go with an alternative system entirely to public comments.

Unless you know of some awesome solution nobody’s ever heard of, I suggest that people either learn to live with the spam, learn to delete it, or give up on comments as an experiment that didn’t work out long-term. To convince me that there is some fourth ”filter out the spam” solution, please first demonstrate the solution at work in the email domain. As of right now, there isn’t one; even SpamAssassin is only effective because the authors are involved in a constant arms race.

 
Comment by Phil Ringnalda #
2002-10-28 13:40:45

Well, yes and no. There are two separate problems: the same old one of individually spamming a single comment, which has been around for quite a while, doesn’t do the spammer much good, and is easy to delete, and scripted spamming of every single comment thread (I stopped mine after 120, but otherwise I would have had 463, and if I had more entries there would have been even more, since he got 650 ”403 Forbidden” responses before he gave up). I agree that a single spam comment is impossible to detect, even more so than with email, but with email spam you rarely get more than five or six copies of the same spam at the same time from the same sender. This is more like getting six or seven hundred copies of one email spam, and if it wasn’t for MT now supporting mySQL it would have been like getting them with an email client that doesn’t allow you to delete multiple unread messages. That’s something that can and should get a tech fix, even though it will be difficult (I could do the spammer’s part of the programming in a few minutes: fake referrer, random name and fake email, rotate through several links that all redirect to the same site, random comment text, go as fast as possible without tripping anyone’s ”too many comments too fast” trigger; doing our side of the battle will be a bit harder, and will probably need better programmers than me).

But I will not give in, I will not let them take away my comments. The comments are the only interesting part of this site: without comments, I’d be bored with it in a week, and shut it down in a month.

Michel: Trackback’s even easier, but we probably shouldn’t admit that. Luckily, very few people put it anywhere but behind Javascript, so it’s not very valuable to a spammer: I assume that the goal of comment spamming is PageRank, rather than direct visits from such a lamely crafted spam.

Comment by NiteOWl #
2003-08-26 07:49:36

Page rank? by this do you mean the number of pages reffering back to his? I’m new to the bloging world and now I come across this. I droped message boards of of web sites I designed 5 years ago because of people leaving spam or vulgar comments. So I would think it only natural that that would happen here, but this, this is going way to far.

This may be a possible solution though if some one out there can write the script for it. Visit godaddy.com and search for a domain you know is in use. Use mine if you like to make it simple. Now when it replys that it’s take click on more information link and it should give you the whois info….., but wait it doesn’t it dumps you on a page where you have to enter a pass code that it gives you on that page and then you get the whois info. They claim this prevents a script from searching all their whois info and spamming the contacts with email as a script can not read the image file that pressents said pass code to you.

Maybe this would prevent script spamming to your comments and be an acceptible inconvinance to hostest commenters and it doesn’t require registering, just a human to read the image and enter the code. The only thing is you couldn’t use just a few codes or the spammers would figure it out.

 
 
Comment by Burningbird #
2002-10-28 14:02:33

Jeremy, I posted at my weblog on the issue and Sam Ruby had good suggestions. I’m going to put them into a fix tonight, test it, hopefully get Phil and Sam’s vetting, and then publish it tomorrow.

At least, it will help slow the flood if nothing else.

Spammers aren’t going to go to each site and try and pick into the site’s brains. We don’t have to use extraordinary means to keep anything but a determined hacker out. And at this time, I’ll take what I can to protect the comments from the non-determined idiots.

I _would_ never give up my comments. Without the interaction, the blog would be flatter than a pancake, and just as interesting.

And if Phil gave up his comments, how would Shannon give him a bad time? How could I irritate the politically conservative? How could I beg Tim Tams from certain people?

It took a month of nagging to get Userland to put comments into Radio — that’s how important they are to weblogging.

Give them up? Never!

We’ll start with simple fixes, and go from there. There are evily twisted techie minds in our midst — we will contrive.

 
Comment by Jeremy Bowers #
2002-10-28 16:31:40

There are evily twisted techie minds in our midst — we will contrive.

Actually, you put your finger on the problem… the ”evily twisted techie minds” are on the spammer’s side. It’s the script kiddie problem; you’re not fighting each individual spammer… or at least won’t be for long… you’re fighting against the professional spam software writers and sellers. That’s why you get spam… the SPAMMERS are stupid, the spam software writers are as bright as anyone else, they do this full-time, unlike the rest of us who have real jobs, and they have no soul. We’ve got twisted techie minds, perhaps, but we’re not evil.

Instead of fighting this, roll with it. Detect somebody dumping a lot of comments in a short span of time, regardless of content, and manually filter the rest by putting them in a convenient queue, no matter where on the site they were placed (either that or lock down the comments after they roll off the home page), with easy checkboxes to either approve it permenently, or delete it on the spot, all from one easy interface. (There really ought to be an interface for the site owner to see all recent comments anyhow, if your choice of tool doesn’t have it already.) (The flooding detection won’t stop any spammers, but it will guarentee that you don’t have to wade through 2000 spam comments.) It requires a bit of investment on the site owner’s part, but there’s no way to defeat this scheme, because there’s a human in it. It’s really the exact same final filter you apply to your own email box, except since its a website, only one person has do it per site, instead of per email recipient.

You might as well skip the endlessly escalating arms race; jump straight to something effective quickly, and maybe the spammers will decide it’s not worth their time to bother us.

 
Comment by Promo #
2002-10-28 20:47:41

I was hit too, same fella same method. Like you, if I hadn’t seen 140 comments suddenly appear in my tray email checker, I’d have been slammed. Thanks for posting the mySQL text to delete the comments, that will save a lot of work.

I hope there is a solution soon. I doubt this will be then last attack.

 
Comment by Burningbird #
2002-10-28 21:11:44

Unfortunately, http_referer won’t work (as Phil warned). What I forgot is that allowing empty referers through will also allow the spammer through — he would be posting the spam comments through a Perl app not through a web page.

(Yeah, I know I’m slow).

I have another idea at my weblog Phil. Take a look. Feasible?

You know, we asked for this, bragging about what a force we are. Well, now we just done become a target, too.

 
Comment by Ken MacLeod #
2002-10-29 07:45:18

Putting two and two together, not so far back somebody suggested that comments fields take a FOAF URL to provide all the remaining info. The comments system can pull the foafs from commenting users, such that any ”new” user will likely already be ”foaf:knows” by another commenter. Otherwise, it goes into the ”needs approval” bucket.

 
Comment by chuqui #
2002-10-29 19:46:10

I hate to tell you this, folks, but if we’ve had a couple of un-related spam-comment attacks (and it seems as if we have), the most likely scenario is that someone has written a script to do this, and it’s now starting to circulate through the hax0r underground networks. Since it looks like at least one of the spammers today was a clueless idiot, I think it’s also safe to say you’ll find out he didn’t write the hack, he got it from somewhere and started playing with it.

Which means the floodgates are just starting to open, but you can’t expect them to magically go away.

security by obscurity and trust only works when you’re too small to be worth writing hacks for. The days of that for blogs seem to be officially over.

we should all assume that this stuff is now scripted and available to all of the idiots who know how to find script-kiddie stuff. it won’t be the hackers doing this for a challenge coming, they’ve already finished that job. it’ll be the JD’s doing it as a joy ride showing up next, and they run in packs.

 
Comment by Pepino #
2002-10-30 02:49:15

How about inserting a hidden field in every comment-form that has a checksum-value in it.
This checksum-value allows the poster to leave exactly one comment (in a specific time, e.g. 5 minutes). After this time or after a comment with this checksum has been made, this checksum becomes invalid.

This way the spammers have to get into real contact with your website to obtain each new checksum for each new spam…

What do you think about that?

 
Comment by Jennifer #
2002-10-30 05:12:38

I think Pepino is on the right track… if it is a script, then they probably aren’t even going to each page (?) to send the comment… it could be either a checksum or some value that can only be given on an ENTRY page… and without that value, the comment does not go through… kind of like a ”session id” or something… the server gives it to the page new each time, each page refresh, or load… then when the comment is posted, it checks to make sure that session id is there, and it’s the one that *IT* generated…

unfortunately, while I could probably figure something like that out in PHP, I’m clueless when it comes to Perl… Does this sound doable to anyone?

(I hope you don’t mind, but I want to copy and paste this same comment on Promo’s post on Scriptygoddess in case anyone who reads it there can figure it out…)

 
Comment by Phil Ringnalda #
2002-10-30 08:59:13

Ken – I’d love to give this problem a few whacks with a FOAF hammer, just because, but I don’t see offhand how it could work. If you require a known FOAF URL to comment, then the spammer just has to grab any FOAF URL from any existing comment, and post with that, putting his spam link in the comment body rather than in the URL field. There may be some solution in all the WOT stuff that I’ve read without understanding it, but it’s going to take some serious popularization before it’s ready for general public consumption.

I don’t think that a solution that just involves putting a hidden element in the form will really do any good, because it’s just not that hard to get the page and parse out the form: I did it with a ten minute knockoff PHP script that would only take a couple of minutes to suck down a fairly typical blog of 600 entries, so with a decent threaded spider and a better parsing algo, I’d expect that a spammer could get all of your hidden form fields in a few seconds.

Probably the solution is something along the lines of what Brad suggests: if you combine a hidden element that’s based on the IP address you sent that copy of the form to, which expires after some period of time, with throttling, only allowing one comment from a given IP address in so many minutes, then at least you force the spammers to switch from wholesale flooding to retail spamming, giving you a chance to block their address before they’ve done too much damage.

 
Comment by Pepino #
2002-10-30 10:19:19

I think ”Brads suggest” is pretty the same i suggested. The hidden checksum (or whatever you call it) should of course be generated dynamically based on the time and on some other things.

But i wouldn’t really use the ip-address because the ip-address could change during the visit very easily: e.g. because of proxies or dial-up-connections with short time-outs or because the spammer faked the ip-adress…

 
Comment by Phil Ringnalda #
2002-10-30 10:41:24

Hmmm. Proxies are a problem. If you don’t use the IP address, though, it’s far too easy to get around a throttle: randomize the comment text, author, and email, and use several different URLs that all redirect to your target URL.

 
Comment by chuqui #
2002-10-30 12:51:44

I think this is a great place for a multi-level solution. Some thoughts:

Whitelist: a site’s blogroll is an obvious one. If a site gets a trackback from a site in their blogroll, then accept it. it’s possible for the referer stuff to get futzed by someone really motivated, but I believe there are ways to keep that under control (perhaps by defining an IP range we’ll accept that referer from)

shared secrets: similar to what MT does to enable pinging to the MT site. Can even be automated in some way, perhaps by sending the shared secret to a specific email address, similar to a confirm on a mail list. it’s a widget that is included in the trackback that a receiving site can use to validate the sending site enough to trust it. you could even go to a public/private key system to back validate people, but I think we want to avoid encryption because of the possible restrictions…

moderation. let a site decide what to accept and what not. Hold the rest for manual approval. Perhaps by IP number of range, which gives you an effective blacklist as well.

As I’ve said elsewhere, trusting systems simply don’t work as the population grows. But you can build trust into systems without going to a full registration setup. and for comments and trackback and etc, per-site registration will inhibit it enogh you might as well shut it off. but that doesn’t mean we can’t build systems that allow for some level of trust without going through the hassles of a registration system. I think having an automated tool that sends a shared secret that can be attached to trackbacks to a site, and only sends it to an e-mail address (which gives you a valid place to start searching for the idiot) shuts down most of the problems with minimum hassle to all involved. it’s the same general capability mail lists use, and users have shown a willingness to accept them, and they don’t generate any significant privacy issues, but still give you some ”real” identification to help solve problems if they do occur.

 
Comment by John Burton #
2002-10-31 05:25:39

If you neede registration how about creating a central registration site so you only need to register in a single place for many blogs, then if you abuse it your login can be removed centrally and also bloggers can query the site to get the ids of cancelled people and remove their postings automatically.

It doesn’t need to be very secure you just need to submit your name, url and email and it confirms your email address. Make it take 30 minutes to active after its set up so that cancelled people can’t just keep creating new logins.

 
Comment by Pepino #
2002-10-31 06:09:41

@John Burton: You mean something like antville.org!

 
Comment by Morten Frederiksen #
2002-10-31 06:19:54

<sarcasm>
… or something like PassPort(MS)…
</sarcasm>

 
Comment by Gerald #
2002-12-20 11:07:27

And now, it’s so quiet? Isn’t it a problem any more? Sooner or later the next attacks will come and perhaps flood the unsecured blogs.

 
Comment by dodo #
2003-02-08 13:50:00

good call, i’m adding it.

 
Comment by roadrash #
2003-02-13 23:06:38

Hi

 
Comment by Charles #
2003-02-17 19:17:15

I am the webmaster for my wife’s site. If you have a legitimate spamming complaint, you could complain to the GLVAR (LV realtor association) and have him fined. Although there are people who are quick to cry ”spam” (I had a call from a moron like that, he was mad he got an automated thank you e-mail AFTER he put his e-mail address in the guestbook) but it sounds like your case was malicious. Here is the address http://www.lasvegasrealtor.com/

 
Comment by Phil Ringnalda #
2003-02-17 19:40:32

Malicious, yes, but also several months ago, and instantly removed (since leaving it in place for even 24 hours would mean that he won, by getting Google to see the links). And given that my comments form doesn’t come with a disclaimer saying ”you may only submit comments which are actually germane, and not the same damn moron comment on every entry” it’s actually just sleazy and annoying, not illegal.

 
Comment by ChuckL #
2003-02-22 11:40:11

I finished my 2 week free trial of Spam Alert—now I can’t get rid of their ”Reminder”
screen at Desktop startup.
Any ideas?? Many Thanks!

ChuckL

 
Comment by quino #
2003-09-07 12:02:05

What about having to enter a code that the user sees as an image, like the thing Yahoo has for opening a new account?

Comment by Phil Ringnalda #
2003-09-07 12:51:13

The only thing that makes those vaguely acceptable from an accessibility standpoint is the way that Yahoo! has employees available 24/7 to take care of an alternate scheme for visually impaired people who can’t see them. And since looking at the requests in my access log after a spamming makes it look like at least some of them are being done retail, by an actual human filling out the form on the page rather than a script POSTing without even a GET first, it wouldn’t stop the spammers, just annoy and frustrate the real users.

 
 
Comment by Chris #
2003-09-30 06:51:37

I have Eudora set up to play a sound (awooga-awooga-purple-alert [FanOf=RedDwarf])when a comment mail comes in and it scores over 3. Perhaps some sort of parsing and scoreing would be in order and then queing comments that score high?

 
Trackback by Burningbird #
2002-10-28 07:53:08

Comment Spammers redux

Seems to be a technology day today. Phil caught a comment spammer who was trying to dump spam comments in all of his posts. This process would work within any weblog that sequentially numbers weblog posts (ie Movable Type). I’m

 
Trackback by gessaman.com #
2002-10-28 17:37:19

Sounds like a Monty Python Sketch

It seems to be spam-day on the internet. Phil Ringnalda was hit by comment spam Two of the most eggregious

 
Trackback by Solonor's Ink Well #
2002-10-29 06:21:04

Comment Spam Update

More info on the comment spam topic. Coincidentally, Ginger also posted on the topic yesterday. She linked back to Phil

 
Trackback by ***Dave Does the Blog #
2002-10-29 08:45:38

Well, I have a solution …

… but it involves a baseball bat, knee caps, and a demonstration of one or two of Newton’s Laws. Seem’s

 
Trackback by scriptygoddess.com #
2002-10-29 10:58:12

Comment Spam

This is more of a ”Heads Up” post than anything else. This past weekend I was working on a blog

 
Trackback by bradchoate.com #
2002-10-29 16:27:58

Comment Spamming

Phil Ringnalda recently got hit by a comment spammer– arguably one of the lowest forms of life. They care not

 
Trackback by Teal Sunglasses #
2002-10-29 20:05:12

The script kiddies are coming. The script kiddies are coming… (sigh)

if you look here, here, here, and here, blog spamming through content is suddenly showing up around the net. I

 
Trackback by nico | couchblog #
2002-10-29 22:52:42

Comment Spam Attacks

Mark setzt sich ausführlich mit den jüngsten Spam-Angriffen auf Weblogs auseinander. Eine Lösung gegen das scriptgesteuerte Zuspammen über die Kommentarfunktion (bspw. in MT) ist leider noch nicht in Sicht (hier leider auch nicht), obwohl natürlich man…

 
Trackback by kusor dhtml weblog #
2002-10-30 10:41:18

Spam en los weblogs

Estaba leyendo el weblog de Mark Pilgrim, concretamente el post relacionado con el spam en los comentarios de los weblogs

 
Trackback by Teal Sunglasses #
2002-10-30 12:55:55

more on the blog comment spam issues..

discussions are spreading across the blogs about how to deal with the blog-spam issue. A good starting place continues to

 
Trackback by kd: a blog #
2002-10-30 15:52:25

there’s evil afoot

geeky evildoers of evil

 
Trackback by Brownpau.com #
2002-10-30 20:16:31

Comment Spam, Bagels, and Racist Aunts

Bad enough that we get email and IM spam; now our blogs have to worry about comment spam. The eminently linkworthy Dane says that he

 
Trackback by Mentalized #
2002-10-31 17:20:31

Spam, spam, baked beans, and spam

phil ringnalda dot com warns about the latest idea from the lowlife scum also known as spammers: Target the comment

 
Trackback by Joni Electric #
2002-11-04 20:00:33

”No Comment”

I’ve been following with interest the issue of comment spamming through MovableType (and presumably other content management systems that use

 
Trackback by andersja's blog #
2002-11-25 11:16:26

Blog comment-spam

Unscrupulous marketeers spam blogs’ comments…

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.