One stop hardening shop
There’s really nothing that will stop a determined comment spammer or crapflooder, but Shelley’s certainly made it easier to harden your comments and Trackback to the point where it’s no big deal to clean up afterward, by combining instructions on what to do with versions of the files you need to change in MT-Blacklist or MT itself already patched with Jacques‘ fixes. And for good measure, she’s got a patched version of David Raynes’ Optional Redirect plugin, to let you upgrade to MT 2.661 without having to accept the heavy-handed “thou shalt redirect all comment links” feature.
Also interesting: Matt’s comment talking about what he has in mind for WordPress: having the comment/Trackback throttle not shut things off, just trigger moderation. Once you get too many comments, any additional comments just go into the queue for approval. As long as deleting huge swaths of them is easy enough, that ought to give the best of both worlds: you aren’t doing a denial of service on yourself by having an easy-to-trigger throttle, but you don’t have to give crapflooders the satisfaction of seeing all of their handiwork. And, although WordPress doesn’t have MT’s handicap of building static pages every time a comment is added, so it’s not likely to take down a server just by getting a few hundred comments, if you added a moderation-throttle to Movable Type, you could ease a lot of the backend pain of being flooded. Once the moderation kicks in, you stop rebuilding pages, and you only send one email saying “you’ve got comments to approve or delete”, rather than sending one per comment. Sounds good to me.
The other thing that sounds good is the licensing for WordPress. If you aren’t a programmer, most of the talk about open-source probably doesn’t get you too excited. So what if you can get the source code for your browser, or your office suite, or your operating system? It’s not like you’re going to teach yourself how to program so you can move the menus around to suit you better.
However, it matters. With an open-source project like WordPress, if the developers get a case of the stupids and decide not to include something essential, anyone who chooses to can just grab the source, hack it in, and make it available. In some cases, the very realization of that possibility can even make open source developers a bit more likely to not leave out something their users want.
Also, since open source projects tend to make their work in progress available (you can browse WordPress’s CVS repository if you want to see what they’ve done on any given day), they get more eyes and more ideas thinking about how things should be done. Had someone put a comment throttle that was based solely on the commenter’s IP address into WordPress, someone else would have very quickly pointed out that spammers and crapflooders all use multiple proxies, and so the only people it will ever catch are multiple people behind a single NAT router (you don’t have a whole bunch of people at one company who are likely to all try to comment on your latest entry at once, do you? Good. MT 2.661 will ban their router’s IP address. And you don’t have lots of readers using a single ISP that runs requests through a small caching farm, do you? Good.).
Consider those last four aimless, awkward, under-edited paragraphs a taste of my new blogging style. There’s something there that I want to say, but I’m not sure how to put it. And after dozens of times just casually mentioning something, only to see later that someone took it as the gospel truth, and got in some sort of trouble, just because I mentioned it, I’ve gotten myself stuck to the point where I’m hardly ever willing to post anything. So, at least for now, my plan is to just write whatever I please, however it comes out, and let the chips fall. I never signed up to be perfect.
I think WordPress shows a lot of promise. And being Open Source is a big, big advantage.
But I kinda disagree with the notion that, just because it builds pages on-the-fly, WP is less vulnerable to crapflooding.
Matt’s blog was knocked offline for several hours, due to a (relatively small) trackback flood. This probably was before he implemented WP’s ”moderation throttle”, which would have shunted all those trackbacks off into a moderation queue. Maybe the current version of WP is more resilient.
But, as a general rule, any system serving dynamically-generated pages is going to be more vulnerable to DoS attacks than one serving mostly static pages.
Moreover, shunting comments into a moderation queue is also an interruption of the conversation, just as having a throttle kick in and deliver a ”Try your comment again later.” message.
It’s a little less annoying to commenters. They know their comment made it into the queue. But they don’t know that it won’t get deleted ”accidentally”, along with all the crap that triggered the moderation throttle in the first place.
Was it? And how was it? I saw that he got some Trackback crap (with the amusing ”page-widening” titles that don’t do much of anything to CSS layouts, heh), but I didn’t notice it being down at all. Moderation or no, with a fried site, all you should have to do is accept a little bitty post, grab a connection to the database, stuff it away (with the moderation boolean set or not), and be done with it. Unless you swamp the db connections, I don’t see where there would be a bottleneck there, like there is with having to rebuild a bunch of pages. Hammer sendmail with a few thousand notifications, if you don’t have moderation to tell it to only send one saying ”you’ve got stuff to moderate”, maybe.
Dynamic versus static, yes, but if you rebuild your static pages in response to user input, you’re dynamic. You just aren’t built for it.
Moderation will interrupt the conversation, but at least the conversation picks back up where it was later, rather than stopping with a discontinuity. Screwing with my throttle values, I’ve missed several comments from people who didn’t bother to come back later to try again. Since the GNAA seems to be scared of me, I should probably crank all the throttle numbers up a bit.
So does anybody know how easy it is to export an MT database to WordPress?
The FAQ’s probably out of date [and being on the docs squad, you may fire at me when ready], but the MT importer works just fine. Matt got that working spiff for the 1.0 release.
I guess I should have searched before asking that one. From the WordPress FAQ:
Maybe the more appropriate question is When will the MT import support be ready? I’d love to try out WordPress, but I really don’t want to do any manual data slinging.
I exported all the entries from one of my weblogs to a wordpress installation. It worked just as the directions indicated.
Be aware,though,(and this may not matter to you or to most people) that some of the extra fields like keywords do not survive the ccnversion process. If you do anything special with those, wp may not suit your needs.
I still prefer mt, but wp is a truly worthy competitor. And I definitely look forward to a chance to play with Textpattern.
Keywords, or threaded comments. That’s the current double-killer for me: first I write my own exporter for MT to preserve the threading, then I write my own importer for WP’s threaded comment hack. Not impossible, but certainly a speed bump.
Phil, a rework of threaded comments is in the works. I love the hack myself and the new version should be cleaner, sans JS, and have a bunch of improvements (which might include an import from MT). By the way, in MT, is there a comment ID, and comment_post_ID (which post the comment belongs to) and a comment_parent_ID (is there a comment thread parent?)?
Also, there is a WP Blacklist incarnation which feeds the moderation queue with comments it thinks are spam.
Yep to all three: comment_id, comment_entry_id, comment_parent_id. A tiny hack to the export routine, and you’re exporting them, a tiny hack to the WP import routine and you’re importing them. The actual sticking point is the fact that MT thinks that Trackback pings aren’t comments, so there’s a comment number 1250 and also a ping number 1250, and you want to preserve the comment number in case it’s a parent, but assign the pings new numbers higher than any of the comments. Before importing, you have to run through the whole import file looking for the highest comment id number, and assign the pings new id numbers higher than that.
I only know of six or seven people actually using threaded comments in MT, so building import into the hack or the importer probably isn’t worth the time. I want to find a fairly robust way of doing it, so I’ll have it if any of those five people ask me, but I don’t expect more than one or two to.
WP Blacklist needs someone to find some time to do some serious hacking: since you can’t add to the list (well, not easily) it acts like a blacklist will save you, like there’s this limited number of people who will ever spam comments, when really a comment blacklist is just a ”Don’t fool me twice” thing to keep people who’ve found you once from coming back later to add more. Don’t know yet if I’m that someone, since there’s multiple blogs and comment preview and who knows what else to be hacking on, too.
I’ve been using WordPress for a while after having been a very long-time GM user. I never switched to MT because of the static pages thing; static pages suck majorly, because they break down under heavy load. Peak usage is the problem with static; average use is the problem with dynamic.
Matt’s very clear about why WordPress is still open-source [its predecessor, b2/cafelog, was GPL’d from the start]: if a plague wiped out the coders, someone could take it from there. People can and do hack the codebase all the time to provide features.
Come on over. :)
From my twisted perspective, the fact that people are constantly hacking at the guts, and the way that there’s no separate template, so you have to get your fingers dirty in the PHP to change your page layout, is a huge plus. I don’t know how many times I’ve dug into MT, figured out how to fix something to solve someone’s problem, only to have them say ”oh, I dasen’t change teh MT sources!”
Yeah … even so, I’ve only ever had to hack at the display functions once [and that only to see what they were doing; I disagree with how the default codebase outputs links in a nested, unordered list].
Everything can be manipulated with CSS. Everything. :D
”I never switched to MT because of the static pages thing; static pages suck majorly, because they break down under heavy load. Peak usage is the problem with static; average use is the problem with dynamic.”
Huh? What kind of web server are you running that struggles with serving static pages under load? Static pages are always going to scale better than dynamic pages if you have even modest amounts of traffic.
I was having a hard time parsing that myself, this morning, but I chalked it up to morning. It does seem to me that the usual way of things is that when you get slashdotted, you try to weather the storm by switching your dynamic pages to static.
But beyond that holy war, it seems to me that the significant thing here and now is that under an extreme comment-flood (I’ve never gotten more than around a thousand in a single flood, but I know other people have gotten several thousand at once), MT suddenly changes from being static pages to being accidently dynamic, in an unplanned and inefficient way. I’m guessing that Ben would have done things a bit differently if he was planning on the code having to rebuild everything that a comment triggers rebuilding, multiple times per second.
Even the idea of shunting comments to a moderation queue, whenever the load gets high (an idea I like very much, despite appearances to the contrary), can be dangerous if you are working with MT’s default BerkeleyDB. MySQL can handle a very high data load without risk of corruption.
I’m not sure BerkeleyDB would respond as well to a crapflood, even without this issue of page-rebuilds.
My worry about using a moderation queue to handle a crapflood is the following. Say you are hit with 8000 comments. Now you have 8000 comments in the moderation queue. Most of them are crap; perhaps a few are legit. How are you going to find the legit ones and let them through, while deleting the rest? Without a really good user interface, that’s going to be an onerous task. The easier alternative is to just delete all 8000, and hope the legitimate visitors repost their comments. But, if you’re going to do that, you might as well have rejected all 8000 to begin with (as with a throttle).
So the utility of having a comment-moderation queue instead of a throttle is highly dependent on having a really good user interface.
I like the odds of not missing something better with moderation than with just comment management, though. I assume that MT3 will be more or less like TypePad’s new style, where [they say]:
Okay, you’ve been crapflooded. Name, email, IP (where’s URL?) and body are all random. You’ve got a few thousand, so 20 at a time’s not going to do it. You have to show them all. I’ve got something like 2700 legit comments, so I load up a page with, say, 5000 comments on it. Over dialup. I can either check them all, and uncheck the 2700 real ones, or check the 2300 fakes individually.
What we need is to be able to filter on date, show all comments since the date and time it started, and then at each comment, be able to one-click delete any newer than that. Start at the latest, scroll down in pages of 20/100/500, and when you get to a real comment, click the ”delete everything after this” button, then start scrolling again (working with the same set of comments until you quit, so you don’t have to worry about new real comments being added).
Moderation makes things a little different, since all you need to be able to do is scroll in big (but not ”all”) pages, with a button to click to let a comment pass, which lets it through right then, and returns you to that spot in the list. Let any real ones through, delete all, you’re done.
Unless a person is slashdotted, whether a page is dynamically served or static isn’t much of an issue unless they have a very poor server, of their software is crap.
Most people will not ever be slashdotted, and if they are, a lot of servers couldn’t handle the sudden load anyway. Slashdot is not a good measure of performance requirements.
A site that’s heavily commented probably would do better with a dynamic page then a static one, but again, most of us don’t get that many comments all at once — outside of the recent problems.
I think the more interesting discussion of differences between WordPress and commercial software such as MT is that the former is open source, the latter closed.
Right now, open source has a great deal of appeal to me.
”I was having a hard time parsing that myself, this morning, but I chalked it up to morning. It does seem to me that the usual way of things is that when you get slashdotted, you try to weather the storm by switching your dynamic pages to static.”
Sorry for the confusion; I should have noted that I was talking more about page-rebuilding. If you’re having to rebuild a page every time someone comments … ask Wil Wheaton about that issue from the days when he ran Greymatter.
I’m willing to take the resource penalty that comes with dynamically generating a page to avoid that problem. I think it scales better to the crapflooders, too.
I’m not referring to serving pages; I’m referring to generating them. I’ve seen too many problems with rebuilding pages to ever want to mess with that again.
I guess more to the point, you can do a direct comparison in MT with MT-view.
One question: why?
Stepping Stones to a Safer Blog
edited In the last few weeks, I’ve been hit not only by comment spammers but a new player who doesn’t seem to like our party: the crapflooders, people who use automated applications (you may have heard of the program called ”MTFlood” or some variation)…