Does anyone happen to know what Newzcrawler‘s up to? I was just watching tail -f access.log (like TV, but even more boring), when suddenly everything was one IP address requesting the RSS feed for entry plus comments for every post I’ve made since August, with a request for /favicon.ico in between every request. Luckily, that amounts to just a whole lot of 304 Not Modifieds, but a little grepping says that since it happens every 30 minutes, it also amounts to 6520 requests over the last 24 hours. From one IP address. With two entries and seven comments in that 24 hours, things were far more active than usual, but still, 6520 requests seems a little overboard. I’ve got gobs of bandwidth, and Apache shouldn’t have any problem with someone doing five or six requests a second for thirty seconds or so, but that’s one IP address. Switch a couple hundred people over to Newzcrawler, so it’s 1,304,000 requests a day, and I think it might get my host’s attention.

So, does anyone know what Newzcrawler’s doing, or what I’ve done wrong to confuse it into thinking it should request every single RSS feed I’ve done since last August, every thirty minutes? I’m willing to take the blame for that, and fix whatever I’ve done, but that bit about requesting /favicon.ico over and over, once for every request on the same host? That’s a bug.

Update : A bug, and one that’s already fixed in the current release. From the comments and Andrew’s answer in his forums, it’s just what I should have thought of, if it was before midnight or if I ever slept anymore: combine the good thing that’s storing old posts so a user can refer back to them with the good thing that’s fetching wfw:commentRss feeds for entries that you know about, so you can show the user the comments, and you have a whole lot of requests.

If you’ve got a better idea for how to keep those two good things than what’s in my comments or the Newzcrawler forum, I’d appreciate it if you would post it (over there, so Andrew doesn’t have to keep checking my comments. Well, unless he’s subscribed to the commentRss feed ;)).


Comment by Roger Benningfield #
2004-05-03 05:16:22

Phil: I’m not on Comcast in Illinois, nor does my IP look anything like that… but I *do* use Newzcrawler.

FWIW, I cleared my archive of your feed and then hit ”Update”… the oldest thing it pulled down (at least visibly) was ”Doing Your Own Technorati Links” from 2004-04-22 and all associated comments. Subsequent ”update”s don’t retrieve anything else.

Now, it’s a good thing that my copy of NC doesn’t appear to be polling for stuff from August… OTOH, I *am* curious as to why it isn’t displaying ”Tangents around persuasion” from the 18th. That’s currently the oldest item in your feed, so why isn’t NC displaying it?

Hm. Off to post something in the support forum…

Comment by Andrew #
2004-05-03 05:37:48

I’ve just answered your question in our support forum:

Comment by Phil Ringnalda #
2004-05-03 07:12:22