Got a piece of it

Dave almost got it:

What if comments, by default, were deleted after 24 hours? What if the owner of the site had to check a box in order for a comment not to be deleted? That way if a comment had lasting value, the owner of the site could make sure it sticks around.

But that’s not really quite enough. Googlebot’s got a serious jones for my comments, and can’t imagine staying away more than 24 hours, and no matter how much or little benefit they get from just having their link indexed once, I don’t want it ever indexed, and I don’t want people who subscribe to my comments feed to have to wade through the spam, whether or not it will evaporate after a while. So, solutions:

Paul Freeman’s absolutely right that we need to moderate the URL, not the comment. Rather than moderate everything, and have comment conversations only happen while you are awake and online, or moderate-by-default but only after its already been crawled, let the comments go through, but don’t link any URLs, either the commenter’s URL or links in the comment body, until you’ve approved it. That takes care of the unwanted indexing, but still leaves you with dozens of comments saying “Check out my site for hot shemalez!!1!”

Luckily, so far, most spambots come in to older comment threads that have had a chance to build up some PageRank and get established in search engines. The conventional wisdom says that means you should turn off comments on all entries more than a few days old, but I disagree. Even if I never get another useful comment on an older entry, knowing that a new version of Movable Type means Joshua Kaufman will be reminding me that old hacks need to be updated pleases me so much that I’ll keep hundreds of other threads open, just in case. I feel the same way about only allowing comments on a select few entries: I (and you, whether you know it or not) just don’t know what someone’s going to want to comment on, or whether they’ll want to within a week, or a month. Fortunately, there’s a solution to that, too, though I’m not sure whether I stole it from Chad Everett, or he stole it from me, or we both stole it from someone else: if the entry is more than x days old, and the last approved comment is more than n days old, where x is probably 7 and n is 2 or 3, then the commenter shouldn’t be expecting quick conversation, and the whole comment gets moderated until you approve it for publication. The spambot authors could work around that, by either trolling update sites for new entries to spam or just navigating up from old indexed entries to more recent ones, but so far, for the most part, they haven’t shown much sign of enterprise. At least while it’s a Club that only a few people have, it ought to work out pretty nicely. Add in an expiration time on moderated comments, so that if you haven’t approved them after whatever length of time suits you they just evaporate, and all you have left to directly deal with are plain text comments on your last few entries.

6 Comments

Comment by Matt #
2004-07-02 10:18:02

URIs are useless because they’re completely disposable. I’ve seen hundreds that just happily send a nice header redirect (that preserves pagerank!) to a spammy site. I still think the WP system of selective moderation based on triggers is the most effective around. If people want to plugin URIs (or regular expressions) into the keyword system they can, but they don’t need to.

 
Comment by Hulkster #
2004-07-02 15:14:36

Phil,

Came across your blog from DaveW’s – thought I would pass on something that you might get a chuckle over (or probably just shake your head as I am) … but we all know that referrer log spamming is getting bad … and I recently set up a page mentioning these scumbags … but now the slimeballs are actually hitting THAT page with their referrer spam – read all about it here:
http://www.komar.org/faq/scumbags/referrer-log-spamming/

alek

P.S. I’ve added your blog to my bookmarks, so you might see me pop in every once in a while …

Comment by Phil Ringnalda #
2004-07-02 15:50:42

Ayuh, they’re a charming bunch. I’ve been showing referers, and worse yet admitting it with the word in the page on every post, for two years now without having the slightest problem with referer spam, simply by GETting the page and looking for a link before I accept the referer. Incredibly simple to get around, but amazingly enough nobody actually tried until last week. Now, I’m pretty sure I’m going to have to just scrap the idea entirely. Luckily, if I just stop showing them, so I’m not using the word in the page, I’d guess I can keep using the same script, which alerts me on new URIs, without it getting too spammed.

Also amusing, as a way to get around people requiring that you actually link to show a referer: just republish your target’s RSS feed. I’ve seen that done twice now, too: my own posts linking to me, in a page that will eventually just doorway the PageRank off to somewhere else. They are a cunning lot.

 
 
Comment by Jay Allen #
2004-07-05 01:39:24

…I’m not sure whether I stole it from Chad Everett, or he stole it from me, or we both stole it from someone else: if the entry is more than x days old, and the last approved comment is more than n days old, where x is probably 7 and n is 2 or 3, then the commenter shouldn’t be expecting quick conversation, and the whole comment gets moderated until you approve it for publication.

I don’t know who stole it from who, but it’s going to be in the next version of MT-Blacklist. :-)

 
Trackback by house of warwick #
2004-07-02 05:42:10

More on aging of comments on weblogs

Phil Ringnalda took Dave’s free idea and expanded upon it: ”Paul Freeman’s absolutely right that we need to moderate the URL, not the comment .

 
Trackback by you've been HAACKED #
2004-07-02 10:11:57

Another Attempt To Reduce Comment Span

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.