Comments is comments, mostly

I’ve been muttering about this in my comments, and a few other people’s, long enough that I thought I should bring it up into an entry. I’m a comment slut. I always have been. If I give you three lines of code, and it changes your world, and makes you stop drinking (or, better yet, start), I love hearing about it. If I’m wrong as wrong can be, I love hearing about it. If you just haven’t said hello for a while, I’m delighted to hear from you in a comment.

Commenting is also a great way to get yourself known among webloggers. If you write something interesting, or just write well, in someone’s comments, I’m quite likely to click your link, to see what else you have to say. I found a good share of my favorite blogs that way, following links from comments. I don’t want to see that go away. I also don’t want to see the tiny bit of Google-juice that comes from leaving a comment and a link go away: most of the best content around here is in the comments, and a little bump in Google is hardly payment enough for all you do for me.

But, the inevitable but, I’ve been noticing lately that there are quite a few people who are spamming comments, only with a link to their blogs rather than to a casino, a pill, or a naked woman. If you only have six words to say to me, that’s cool, but if I then see that someone in my blogroll has just updated, and I go over there only to find that you had six similar words to say to her, and searching for link:yoursite.com shows that there are dozens or hundreds of links to your blog, all coming from six-word, nothing-much comments, I’m going to feel used, and even a slut doesn’t like to feel used.

Despite having hated the MT 2.661 shoved-down-your-throat redirect so much that I copied the previous version back into my upgraded file, I’ve installed David Raynes Optional Redirect plugin (with a couple of fixes I left in his comments), so that I can get a redirect link from <MTCommentAuthorLink redirect=”1″>. And despite having little faith in the ability of a Bayesian filter to tell comment spam from real comments in general, I’ve installed James Seng’s MT-Bayesian plugin/hack, so that the filter and I, working together, can call some comments spam, and others ham. Then, with <MTIfSpam>, if one or the other of us (hard to tell which, isn’t it?) doesn’t think highly of your comment, you get redirected rather than directly linked. At the moment, it seems to mostly think that any stranger is spam, and anyone who has left a non-spam comment is ham, which works for me as a first approximation: I usually remember to train it whenever someone comments, so at most you get redirected for a few hours until I introduce you.

If you find yourself redirected for more than just a few hours, then things get interesting. Anyone who knows me at all will of course scream at me, throwing things, and threatening flying monkeys until I get on the stick and tell MT-Bayesian not to be so foolish, but if you don’t know me, and do notice, you’ve got a delicate situation. Did I just miss it? Did I think you were just self-promoting? If you mention it, will I give you your direct link, plus a post extolling the brilliance of your last several entries? Or, will I slip you into my blacklist as a poison pill of sorts, and call you out in public, insisting that you wear a scarlet S for all time? I’d recommend leaving another comment, one so obviously relevant, brilliant, and useful that I’ll be ashamed of my unseemly thoughts. As I said, that’s where it gets interesting :)

46 Comments

Comment by Adam Lasnik #
2004-01-26 20:36:38

This Comment’s Just Six Words Long :D

 
Comment by nick #
2004-01-26 20:46:33

So is this not the time to tell you about my online casino, cheap viagra, and hot xxx action? I thought by doing so I’d be providing a friendly service, but I guess not. Let me know if you’re interested, though.

 
Comment by Phil Ringnalda #
2004-01-26 20:55:24

Damn cheap filter, should have redirected the both of you :(

It must have treated ”comment’s” as ”comment is”. That must be it. Made it seven words. And of course, it has no idea what Viagra is. V1agra, /14gra, sure, but Viagra?

This isn’t going to help its education, is it? Heh.

 
Comment by Sol #
2004-01-26 21:01:14

Hi Phil! We miss you in Whole Wheat World. Best of luck on your Spaminator crusade!

Comment by Phil Ringnalda #
2004-01-26 21:06:41

Hey, Sol! I’m coming back, I swear I am, but every time I start over that way, something comes up that either wants all my connection bandwidth, or all my mental bandwidth.

The upside, though, is how many wonderful new surprises (musical and code) there will be by the time I do get back.

 
 
Comment by Phil Ringnalda #
2004-01-26 21:03:47

Huh. Odd. I thought that 50% (which is what it called both of you) amounted to <MTIfSpam>, but apparently not, since neither of you were redirected until I told it Adam was a dirty spammer (no sense in telling Google that you are the lyrics for a Weird Al song, anyway ;)). Might have to do a little altering, either in the plugin or in the weighting, since what I mostly wanted it for was to redirect questionable things that came in while I was sleeping, so that I could get to them before Googlebot did.

 
Comment by pixelkitty #
2004-01-26 21:12:25

I really like the spam vs ham idea. Will this filter be available to us plebs?

Ive resorted to turning comments off on posts that don’t warrant a conversation, or that I don’t particularly want other people’s opinions on.

But being able to lable a comment as spam or ham, and then treat it appropriately (delete, KILL KILL KILL!) would be wonderful.

 
Comment by aroon #
2004-01-26 21:15:14

hey phil have you heard of spamnet? its a filter for email spam and it works on the general concept of distributed computing. the idea is hundreds of poeple download and install this email client plugin [i think they only support outlook right now (poo on that)] and when they get email it reads through the email and checks with a master list to see if its spam. the master list is updated by the users. so when a new peice of spam is released the filter wont catch it. in that case you select the email and hit the ’this is spam’ button. spamnet learns this and then from then on knows its spam.

isnt there a way we could apply this to comment spam? if not in a general, open standard kind of way, at least in an MT client kind of way?

just food for thought, i dont have the skill to try and do this myself [yet ;)] otherwise id give it a shot.

Comment by aroon #
2004-01-26 21:23:21

as soon as i hit that post button i thought of a better way to explain that…wheres the edit button when you need it?

so basically what im thinking is instead of a local blacklist [like the plug-in i know someone made] it would be nice to have a remote blacklist everyone could contribute to. sound feasable at least?

Comment by Phil Ringnalda #
2004-01-26 21:31:49

Well, I think something like that (only peer-to-peer, rather than centralized) is where Jay’s headed with MT-Blacklist. But for the most part, I’m not sure I want to go there, at least not automatically. You can share your blacklist, and import other people’s blacklists, and I tried it with two people’s, both people whose technical acumen and good sense I trust, and each of them had one entry (out of six or eight hundred, mind you), that I very much did not want to have blocked. One was a difference of opinion, the other was a poison pill (where a spammer induces you to include something that you shouldn’t include). I subscribe to the RSS feed of Jay’s additions to the master list, because I trust Jay to think very carefully before he adds something, but I don’t think I trust anyone else to think that carefully, and that widely about the ramifications of adding something. I know I’ve got a few URLs blacklisted that other people shouldn’t necessarily block.

Comment by aroon #
2004-01-26 23:28:44

hmmmm, point taken. it seems the spammers are getting closer and closer to hitting my current posts..ima have to give the current blacklist plugin a go

 
 
 
 
Comment by pixelkitty #
2004-01-26 21:16:38

or I could just download both plugins, implementing the fixes you mention in the other comments.

d’oh!

Comment by Phil Ringnalda #
2004-01-26 21:44:29

Yep. It’s not quite a simple turnkey thing, since MT-Bayesian’s fairly complicated to install, especially if you’re using mySQL, and have to modify your database, but none of it’s really hard. Just more work that dropping one file in the plugins directory.

And I think now it should work the way I thought it was, going forward. {mt dir}/lib/MT/Bayesian.pm has a $threshold_spam variable that’s 0.9 by default, and a new comment has a probability of 0.5 to start (and stays there, if there aren’t enough ham-or-spam words it knows about to change its mind). That works fine for the normal use, where with spam you don’t show it at all, or show it with a big red Border Of Shame like James does, but for my subtle alteration, I think setting that to 0.49 (or maybe lower, as I get more ham-words in the training corpus) should do the trick. It treats URLs and email addresses as very important, so anyone that is introduced as being a good person should be able to get away with a few spammy words, without going over, and even if they do, it’s a minor penalty for a short time.

 
 
Comment by Jacques Distler #
2004-01-26 21:35:50

So you are really committed to collecting enough spam to train MT-Bayesian? My hat’s off to you.

My total spam count over the past 15 weeks (which is to say, since I first noticed comment spam on my blog and decided to do something about it) is 5 spam comments.

Not enough to train a cocker spaniel, let alone MT-Bayesian.

I’m still thinking about the <MTCommentAuthorLink> redirect. In principle, it’s the right bit of social engineering until Google catches up and starts punishing the comment spammers (there are rumblings that this has begun to happen). In practice …. ?

Comment by Phil Ringnalda #
2004-01-26 21:52:51

In practice? Baby, bathwater. The price is too high, for me.

Actually, no, I’m not collecting that much. More than you, because I’m not quite as scary on preview, so I get a few more hand-entered ones. But all I really want from MT-Bayesian is a over-complicated whitelist, which it seems to be providing, so far. If it knows someone, like it does you, then it recognizes your URL and email, and gives you major ham-points for them. The rest? It would be nice if it learned enough words to really know a few things are spam without help, but as long as it remembers that strangers are bad until proven otherwise, and there isn’t any easier way to do that, I’ll keep it around.

I certainly don’t expect miracles from it, like I do with email, since most words in spam comments don’t matter in the least (I saw someone who was already blocked by MT-Blacklist trying to get in by using a single period or comma as his anchor text, not realizing that the problem was URLs, not words), and I’m not married to it the way I am with POPFile. Just experimenting, and enjoying what it’ll do so far.

Comment by Jonathon Delacour #
2004-01-26 22:58:07

This seems like much more fun than my method, which is to replace the topic text with ”[Removed (off-topic)]” or ”[Removed (spam)]” (depending on my whim), delete the URL, change the email address to ”x@y.com”, and truncate the author name to four characters.

But what’s to stop me leaving a comment as Jacques (thereby stealing his URL and email ham points) and linking to one of my own posts? Come to think of it, isn’t it surprising that there appears to have been so little forging of other people’s identities in weblog comments?

Comment by Phil Ringnalda #
2004-01-26 23:37:52

Shhhhh!

Well, really, wholesale spammers mostly don’t expect HTML to be enabled in posts, so the URL field is all they use, and it wouldn’t do them any good. But a retail spammer who really wanted to slip in a comment while I was asleep, hoping Googlebot would get there first, would do well to borrow someone’s URL.

As to forging, yep, it’s odd. I’ve seen Dave Winer forged a few times, but other than that, well, if I’ve seen it, I haven’t recognized it.

 
Comment by eric #
2004-01-26 23:50:12

I’d like to think: ethics, ego, respect —
but realtiy probably just no one thought about it before now.

PGP signing of email is relatively common, I guess PGP signed comments would be good for more than one reason.

Comment by Phil Ringnalda #
2004-01-27 00:38:31

And of course, pb’s ahead of us, by a year and a half or so. At the time, I wasn’t sharp enough to realize that his approach, not doing any verification, just making the signed comment available in case anyone else wanted to check it, was actually very slick way to get PGP’s foot in the door. If you know what it means, and how to go about it, you can easily sign your comments, or verify comments for people with either a posted public key or a key you already trust, and otherwise it stays completely out of the way. To do, to do, to do.

Comment by Jacques Distler #
2004-01-27 07:14:30

Aside from the intimidation factor, I don’t quite see how to make this work in a user-friendly way. And I speak as someone who has PGP-signed every outgoing email message for the past decade.

The point is not whether a comment was PGP-signed, but whether it was signed with a key that I trust.

Having exchanged email with Phil, maybe I have his public key and can verify that the ”Ringnalda” who signed a comment is the same one who exchanged email with me. But Jonathan Delacour? If he starts placing his PGP key on his blog, then maybe I can verify that the commenter is the same guy as the blogger.

As I think about it, that’s really the only thing I can reasonably hope to learn from signed comments. So why don’t we automate it? I put a

<link rel=”ppgkey” type=”application/pgp-keys” href=”/~distler/distler.asc”>

on the main page of my blog. You should be able to retrieve my public key by following the URL link in my comment and looking for a <link rel=”ppgkey”>. You can then offer a little comment verification button to your readers.

Comment by Phil Ringnalda #
2004-01-27 07:53:10

You’re a quick one, Mr. Distler. Most people who are into PGP are used to the way it works for what they use it for, and want to ensure that a key in their keyring belongs to a particular bag of protoplasm, so they start down the wrong road, wanting to get keys from keyservers, and call people on the phone or meet them in person to verify that it’s the right key for the right person. But for us, the flesh doesn’t matter. What matters is ”was this comment signed by the person who controls the URI-space he claims?”

Last time around, I was too into FOAF, and wanted to link to it from there, but couldn’t get a reasonable answer about how to do it. Now? A link in the XHTML suits me fine.

Verification’s a little tougher, for anyone who doesn’t control their own server. If you can install Perl modules the way they want to be installed, you can use Ben’s Crypt::OpenPGP by just typing perl -MCPAN -e 'install Crypt::OpenPGP', but if you can’t, it’s got a dozen dependencies, and as I vaguely remember from last time, seems like some of them don’t want to be installed by just being dropped in MT’s extlib.

But that’s what I meant about pb’s method, making the signed comment available and nothing more, being a good step, not just a first step. Anyone with PGP installed should be able to verify a comment if they want to, anyone without just needs to install it to be able to.

Comment by Jacques Distler #
2004-01-27 14:51:22

But for us, the flesh doesn’t matter. What matters is ”was this comment signed by the person who controls the URI-space he claims?”

Right. So no need for an extra text-box for commenters to enter a URL of their public key. If they don’t have a <link> from their web page, we don’t care about their friggin’ public key.

Verification would far and away be best done server-side. But, alas, I was unable to install Math::Pari on MacOSX (with Perl 5.8.3, FWIW), so no Crypt::OpenPGP for me…

Too bad, this would be a fun doohicky to have.

Comment by Jacques Distler #
2004-01-27 23:11:38

There’s some strange ritual involving sheep’s blood and burnt sandalwood required if you want to install Math::PARI on MacOSX. That, and installing the other prequisites to Crypt::OpenPGP before attempting to install the module itself (instead of just letting it prepend them to the installation queue).

Unfortunately, its key-management is pretty rudimentary. It uses a keyring and if the desired key is not found on the keyring, there’s an option to download it from a keyserver.

But that’s not what we want. We want to be able to add a key retrieved from some arbitrary URI to the keyring. Unless I’ve missed something, there’s no method for doing that. Strange, but true.

Comment by Phil Ringnalda #
2004-01-27 23:57:16

Ah, I went wrong twice trying to install it on Windows to play with, then. I let it prepend, and forgot the sandalwood (didn’t realize it was a part of the ritual, rather than just something to cut the smell of sheep’s blood). I got so many error reports I didn’t even consider trying to work out what had happened.

Comment by Jacques Distler #
2004-01-28 05:58:16

I couldn’t get Crypt::Idea to build, but all the other modules built successfully, when I tried them individually. Then, building Crypt::OpenPGP, I only had to answer ”no” when it asked whether I wanted to build the ”optional” module, Crypt::Idea. The thing then built just fine.

It was either that, or the burnt sandalwood …

 
 
Comment by Jacques Distler #
2004-01-28 20:29:27

My naive impression, derived from reading the documentation, was incorrect, the methods we need are there. They’re just not documented.

 
 
 
 
Comment by Phil Ringnalda #
2004-02-24 21:41:51

Bloody PGP. Why’s it saying I altered your comment? I didn’t, I swear!

Well, I didn’t knowingly alter it, anyway.

Comment by Jacques Distler #
2004-02-24 22:42:07

Notice the less-than and greater-than characters. These were typed as HTML entitities (how else should I have done it?). But cutting and pasting in your browser will give you something different.

That’s why I advocate server-side signature verification. That’s the only place where you have reliable access to the string of characters typed by the commenter.

Comment by Phil Ringnalda #
2004-02-24 22:51:53

Indeed. Entities, and (while you were realizing that here, I was realizing in Srijith’s comments) HTML: if you put HTML in a comment, and I verify based on rendered HTML, of course it won’t verify.

 
Comment by Phil Ringnalda #
2004-02-24 23:25:41

Grr. And something more than just entities, I’m thinking.

Thinking, and wondering if I can verify my own HTML.

Comment by Phil Ringnalda #
2004-02-24 23:27:23

Indeed. At least, as long as I don’t call it <html>

Comment by Phil Ringnalda #
2004-02-24 23:33:23

And even if I do, it seems.

So what did I clobber in yours (or did you clobber in yours)?

 
 
 
 
 
 
 
 
 
 
 
Comment by dazed #
2004-01-26 22:22:57

Hey Phil – howdy. Miss you at WWR, come back soon with that new music and (ugh) code.

Comment by Phil Ringnalda #
2004-01-26 23:42:24

::hello2! Between my ::computerwhack connection and being buried in thoughts of code, I just haven’t been able to hang out there lately. Better find my way back soon, though, before I have to learn my way around from scratch again :)

 
 
Comment by Richard Evans Lee #
2004-01-27 05:51:35

I’ve suspect some short comments were just excuses ”Hey, look at my weblog!” I always look at the person’s weblog. If there’s some content I don’t fret about the ad. If it seems a pointless comment to get a pointless link I just delete it.

My big pains are the people who leave ”adsfwel” without a proper email link or URL. Just trash, no attempt to do anything except perhaps waste my time. I’m grateful to have MT-Blacklist make deleting the junk comments easier.

I also get lots of guys posting one-line personal ads but that goes with the territory I guess if you write about sexuality. Though I’ve finally realized some of those guys think the comments form is a chatroom screen.

Comment by Phil Ringnalda #
2004-01-27 21:57:30

My problem with ”just delete it” is that even though most of us are the sort who just dive right in to whatever, wherever (as long as it involves silicon, anyway), you never know when that six word nothing much comment came from someone who has been reading you for years, and finally got brave enough to say a little something. I hate to be too harsh, I just want to be harsh enough. That’s why I usually let googling for link:them.com decide if they’re just in it for the self-promotion.

 
 
Comment by David #
2004-01-27 10:00:36

It seems to me that if everyone is willing to open their [proverbial] mouth and spout off their ideas – good or bad – they should be open to comment.

I understand that it is annoying that people reply not on the topic at hand – i.e., insert advertisement spam – or reply only from a need of an ego stroke – your six words example, I take it might be Hello everyone, come check it out – but I still feel that the problem, being a social one, will not soon be resolved by technical solutions.

Comment by Phil Ringnalda #
2004-01-27 11:02:58

This is an interesting comment. Thanks!

Thanks for the link, I’ll try it!

And, as I say, if that’s genuinely what you want to say, because you actually found it interesting but it didn’t provoke any further thoughts, cool, comment away. What I object to is discovering that no post on any blog ever provokes any further thoughts, but most posts on most B-list blogs provoke that very thought. It’s not like I’m examining every comment ever left, but I read enough blogs that sometimes a name gets familiar in the context of six-word comments, and if it gets familiar enough, I might search.

Social solution? Well, beyond the fact that I’m getting tired of being told not to use technical solutions to social problems when the social solution folks don’t seem to be solving any of my problems, let’s see, what would be a social solution? Maybe, posting about it, in a sort of sideways way, pointing out possible technical solutions? ;)

 
 
Comment by Jon Anderson #
2004-01-27 15:17:26

…what about the so-called ”script kiddies”? When the developers of FloodMT release a graphical interface to make FloodMT usable for random kiddies, then all hell will break loose! (It’s all about barriers to entry, man!) Then again, that would probably provide tonnes of material for your Bayesian filter.

Comment by Phil Ringnalda #
2004-01-27 16:55:31

Script kiddies. No ”so-called” about it. No respect for your tools, no rising to a challenge, no understanding of what your script does. I’m beginning to think that big chunks were probably copied and pasted from someone’s sample code. I don’t expect you to even realize that this is an insult, but it lacks even a shred of elegance. And it won’t provide a single thing for my Bayesian filter, since nobody has even made the effort to try to get a comment through in a week now. Sure, you can buy a junker car and run over little children in a crosswalk in front of a school. Big deal.

Comment by DV #
2004-01-28 05:53:36

Your lies make baby Jesus cry (and so do your failed provocation attempts.)

ROR

 
 
Comment by Jacques Distler #
2004-01-27 19:26:37

You prefer the term ”pathetic lamer” to ”script kiddie”?

Whatever trips your trigger, sonny.

 
Comment by dv #
2004-01-28 05:50:48

yeah, because so many people have python and tcl/tk installed.

you fucking attention whore.

Comment by Jacques Distler #
2004-01-28 05:57:00

Don’t call Jon an ”attention whore”!

Oh, OK … go ahead. If the shoe fits …

 
 
 
Trackback by Computer Toaster #
2004-01-27 05:58:33

Junk weblog comments

I’ve bitched about every possible sort of unwanted weblog comment: spammers, self-promoting webloggers, people who type in junk, folks who’ve confused the comments for with an adult chatroom. Phil Ringnalda has given this far more thought t…

 
Trackback by Burningbird #
2004-01-29 03:01:40

Stepping Stones to a Safer Blog

edited In the last few weeks, I’ve been hit not only by comment spammers but a new player who doesn’t seem to like our party: the crapflooders, people who use automated applications (you may have heard of the program called ”MTFlood” or some variation)…

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.