What’s Newzcrawler doing?

Does anyone happen to know what Newzcrawler’s up to? I was just watching tail -f access.log (like TV, but even more boring), when suddenly everything was one IP address requesting the RSS feed for entry plus comments for every post I’ve made since August, with a request for /favicon.ico in between every request. Luckily, that amounts to just a whole lot of 304 Not Modifieds, but a little grepping says that since it happens every 30 minutes, it also amounts to 6520 requests over the last 24 hours. From one IP address. With two entries and seven comments in that 24 hours, things were far more active than usual, but still, 6520 requests seems a little overboard. I’ve got gobs of bandwidth, and Apache shouldn’t have any problem with someone doing five or six requests a second for thirty seconds or so, but that’s one IP address. Switch a couple hundred people over to Newzcrawler, so it’s 1,304,000 requests a day, and I think it might get my host’s attention.

So, does anyone know what Newzcrawler’s doing, or what I’ve done wrong to confuse it into thinking it should request every single RSS feed I’ve done since last August, every thirty minutes? I’m willing to take the blame for that, and fix whatever I’ve done, but that bit about requesting /favicon.ico over and over, once for every request on the same host? That’s a bug.

Update : A bug, and one that’s already fixed in the current release. From the comments and Andrew’s answer in his forums, it’s just what I should have thought of, if it was before midnight or if I ever slept anymore: combine the good thing that’s storing old posts so a user can refer back to them with the good thing that’s fetching wfw:commentRss feeds for entries that you know about, so you can show the user the comments, and you have a whole lot of requests.

If you’ve got a better idea for how to keep those two good things than what’s in my comments or the Newzcrawler forum, I’d appreciate it if you would post it (over there, so Andrew doesn’t have to keep checking my comments. Well, unless he’s subscribed to the commentRss feed ;)).

8 Comments

Comment by Roger Benningfield #
2004-05-03 05:16:22

Phil: I’m not on Comcast in Illinois, nor does my IP look anything like that… but I *do* use Newzcrawler.

FWIW, I cleared my archive of your feed and then hit ”Update”… the oldest thing it pulled down (at least visibly) was ”Doing Your Own Technorati Links” from 2004-04-22 and all associated comments. Subsequent ”update”s don’t retrieve anything else.

Now, it’s a good thing that my copy of NC doesn’t appear to be polling for stuff from August… OTOH, I *am* curious as to why it isn’t displaying ”Tangents around persuasion” from the 18th. That’s currently the oldest item in your feed, so why isn’t NC displaying it?

Hm. Off to post something in the support forum…

 
Comment by Andrew #
2004-05-03 05:37:48

I’ve just answered your question in our support forum:
http://newzcrawler.com/forum/viewtopic.php?t=311

Comment by Phil Ringnalda #
2004-05-03 07:12:22

Thanks, Andrew. I’m trying to think of some way of working around the problem, since I really liked the idea of wfw:commentRSS, and I also like the idea of aggregators saving old items so they can re-thread revived conversations. So far, nothing occurs to me but incrementally backing off on comment thread requests, so that at least you only request months-old (or years-old) feeds once a day when they have become idle.

Comment by Roger Benningfield #
2004-05-03 08:58:33

Phil: I was thinking along similar lines. CommentRss support is a vital feature from my perspective, and I’ve been pimping it to everyone I meet… I’d hate to see folks remove it from their feeds because of this sort of thing.

Having it default to a once-a-day throttle for old entries is probably the best way to approach it. Although I’d definitely want to be able to override the default, just to better keep up with comments on my own blogs.

 
Comment by Andrew #
2004-05-03 09:55:00

We’ll definitely implement the update limitation for rss comments if the news items are older than, say, month or week (sure we’ll make it configurable). The idea to limit updates to 1 fetch per day is very interesting, too.

Comment by Phil Ringnalda #
2004-05-03 10:42:45

Just thinking out loud, without trying to write code (so this may be really stupid), but do you want to back off on the updates based on the age of the parent, or the age of the newest item in the comment feed? (Or am I being naive about how usable item-level dates are?)

If you have a months-old item with an active comment thread (I know of several), you probably don’t want to back down, and also when you do find a new comment in your once-a-day fetch, you want to decrease your interval for a while, since you expect a reply.

Comment by Andrew #
2004-05-03 12:30:54

Just were thinking about the same. It’s not a big difference (from the coder point of view) what date to take for comparison - parent item or latest reply. But this will allow to check active threads more frequently than ”dead” ones. Great idea, Phil ;)

 
 
 
 
 
Trackback by jenett.radio #
2004-05-03 08:16:54

time warped

PastIt looks like Interesting Ideas has been online since ’95 – love the Roadside Art  <a href=”http://del.icio.us…

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.