Aggregator numbers
<update> Substantially rewritten, once Adam explained what I was missing.</update>
I mostly avoid my stats beyond a quick glance at referrers, but Mark and Dorothea are both talking about aggregator preference among their readers, so I thought I’d take a look. I wasn’t quite sure how to manage Mark’s “requests from unique IPs” (which is a better statistic, since raw number of requests will inevitably overrepresent Radio, which automatically checks once an hour when it’s on, over other aggregators which only check when you ask them to), but for number of requests:
Requests | Agent |
1560 | Radio UserLand |
1415 | NetNewsWire |
431 | Aggie |
366 | AmphetaDesk |
229 | Feedreader |
Radio being on top makes sense, because it’s checking whether or not you pay attention to it, and Aggie being above AmphetaDesk makes sense, what with Aggie letting you check more often than once an hour, while AmphetaDesk just gives you the cached file if you try, but the extremely high numbers for NetNewsWire are a bit puzzling. Sure, there are lots of OSX users blogging, but that many? Or does NNW automatically check, like Radio?
It’s probably an Analog configuration problem. What is Analog counting as pages? Unless you have explicitly included xml and php files as pages in your Analog config file, then these aren’t being counted as pages.
My analog.cfg includes this:
PAGEINCLUDE *.php
PAGEINCLUDE *.xml
PAGEINCLUDE *.shtml
D’oh! I never imagined that anything would fail to realize the importance of *.xml,*.rss,*.rdf. It was already counting *.php, so what I thought were successful requests were actually requests for my dotcomments RSS and my old homebrew RSS at /rss/rss*.php, which is now being rewritten to the actual RSS file. With any luck I’ve got it fixed now, and I just need to completely rewrite the post. Thanks!
NNW can check automatically every 30, 60, or 240 minutes, but it does not automatically check by default. People are either changing the default preferences, or you have a lot of NNW readers. I was amazed how many I had. Does Analog allow you to group by IP address? That would resolve some of the ambiguity.
If analog does anything that precise, I’m too lazy to RTFM, so I just downloaded a week of raw logs and wrote a script to parse them. It looks like most people who set automatic checks are setting it to check every 30 minutes (if that’s a per-feed setting, you might want to look at how often I’ve ever posted twice in an hour: once maybe? And at 4am my time? Never.), plus the occasional manual check or restart. Typical sort of access times: 05:39:47, 06:09:47, 17:37:44, 18:07:40, 18:38:26, 19:08:26, 19:38:30, 20:07:04, 20:08:32, 20:38:33, 21:29:55, 21:59:53, 22:29:53, 22:59:53. Guessing at what IP ranges constitute a single person, I’d say the 540 hits came from probably 22 individual people over the last week, which is a perfectly reasonable average of 3.5 per person per day. No big deal: I’ve got tons more bandwidth than I need, and I’d rather have people check every five minutes than not read me at all. It’s just sort of interesting (and kind of creepy, once you’ve sorted it down to seeing that this person checked twice in three minutes, and then again five minutes later: having some trouble with your computer?).
I noticed the same thing on my site — I even had one person using Ximian’s Evolution e-mail client, which also does RSS aggregation, checking every 10 minutes.
Which suggests 2 new features for the aggregators:
1) Per-feed scheduling: I would monitor most of my sites twice a day, but metafilter and freshmeat get enough updates that they have to be checked more frequently.
2) Browser-style file caching. As far as I can tell from my logs, none of them cache the RDF/RSS file, so they download even when it hasn’t changed. If they cached the files, they would instead get a ”304” and no download would occur.
1): Absolutely, and for that matter I’d really like to have my aggregator learn about the sort of schedule it should keep. It’s no big deal to check once an hour if you’ve got a fast connection and only a few feeds, but for 60 (or 300, or…) on a dialup connection, it’s significant, and there are feeds that I watch just in case they come back to life that could do with a ”every 48 hours” schedule.
2): Good boy, Aggie. That’s a good, good boy. (My last scan last night, when the bulk of the people I read were long in bed and the rest were probably far too drunk to type, gathered 32 304s, which is vastly quicker than having to grab the file and compare to the cached version to see if there are new items to display).
Good aggregator discussion
philringnalda.com: Aggregator numbers I chimed in with my current thoughts on aggregators: They should support per-site scheduling, so I can check frequently updated sites more frequently, and less frequently updated sites once a day. They should suppo…