Bring out the HTTP pitchforks!
I’d appreciate it if you would subscribe to a couple of my podcasts.
No, really.
I haven’t decided that you need to listen to me wandering around the house, getting a beer (the thirtieth, if I’m recording my rambling on), just the usual testing of things rather than doing them. Stuart says that he’s having trouble managing to mirror files for the LugRadio podcast, because people complained that they got nothing when he used fixed URLs in the enclosures, that redirected to the actual URL, and now when he changes the direct enclosure URL if a mirror disappears, the file gets downloaded again.
So, a couple of things that I’d appreciate your subscribing to in whatever podcast clients you have around, and reporting on:
- /tests/enclosure/oneguid/
- This one tests my contention that for an item with a guid, a change in the enclosure URL should be ignored. If you subscribe and download the enclosure, then no matter how many times you refresh the feed, no new enclosure should be found, reported, or downloaded. If you subscribe, but don’t automatically download the enclosure before refreshing, then the single reported enclosure should change on each refresh, to the newest URL.
- /tests/enclosure/redirects/
- This one tests the incredibly basic HTTP client ability to follow a redirect. You should wind up with two enclosures: depending on how bright your client is, either redir.mp3 and redir2.mp3, or ideally lesslikeyou.mp3 and losingtouch.mp3 (since the payload for both tests is songs by the too rarely with us Shannon Campbell).
Let me know what results you get, with what clients, and also whether or not I’m correctly testing what I think I am, and whether I’m right about the correct behavior, and then we can get out the ol’ HTTP pitchforks and torches, and go hunting developers. It’s been too long since we’ve had any angry mob justice aimed at them, anyway.
My results, from the two I picked at random (where ”random” means ”these are .NET-based, and thus small enough I’m willing to download them”):
RSSRadio gets it just right: the redirects are followed, the changing URL is just reported as the one URL from the latest refresh until you decide to download it. I think associating itself with OPML (which it might or might not have done if anything else was already associated) was a bit much, but on those two tests it was just right.
Nimiq follows redirects just fine, but ignores guids, and after a successful download sees the next enclosure URL as being something new to download again.
Both tests work as expected in NetNewsWire 2. The new enclosure URL is never even displayed for the GUID test (I looked at the XML source of the feed to make sure that it had actually gotten a new enclosure URL with the refresh).
Thanks!
Hrm. Is that good, or bad? If NNW is immediately and successfully downloading the enclosure, I guess maybe it’s goodish: the displayed URL will continue to reflect what was actually downloaded. Otherwise, that doesn’t seem quite right: if the original URL is wrong, or needs to be changed, between the time the feed is retrieved and when the enclosure is actually downloaded, then the changes ought to be picked up. And even if the first URL was downloaded successfully, you might want a later corrected instance: say the first one was cut off, or the wrong file was accidently uploaded, or you just want to tell someone the URL to download it for themselves.
I need a better PHP harness behind my testcases, I think. It really ought to deliver the first instance with an ETag and no enclosure, and then when that ETag is presented, a new item with an enclosure that has an invalid URL and a new ETag, and when the second ETag is presented then that same item with the correct URL and an ETag that says to start giving random URLs that all work. It’d be a touch confusing for people with a client that failed to deal with ETags, since they’d just keep seeing the first item saying ”Okay, you got the empty one, now refresh the feed to get the one with a broken URL,” but then that’s another nice thing to know, when you’re hauling out the pitchforks and torches.
Test 1: Newzcrawler worked as expected.
Test 2: Newzcrawler worked, but produced the sub-optimal redir.mp3 and redir2.mp3 names.
iTunes/Win survives both just fine: it recognizes a guid, and gets the redirected files just fine, though the way it names them with the item/title (”It’s only temporary.mp3”) rather than the enclosure filename or the eventual filename makes me wonder what it does with an item that includes more than one enclosure.
This is only tangentially related to your post, I just wanted to write this down in a place I know I can find it later. (Yeah, my TODO list for FeedParser consists of a Google search for ”itunes rss site:philringnalda.com” and ”itunes rss site:intertwingly.net”. Sosumi.)
Anyway, yet another subtle but important change that Apple has made to RSS is that they treat enclosure/@url as a GUID, if no item/guid is present. I need to do more testing to get a final precedence order, i.e. does enclosure/@url act as a GUID even in the presence of an item/link?
Good point. I half-noticed that in the spec, three-quarter-noticed it when I saw them using a guid, but haven’t quite made it to fully grasping it. It’s probably actually a reasonable thing to do, in a podcast-only client, though not in a general-purpose client. ”Since everyone ignored my post from last week with ’Here’s what I want’ in the text body and my demands in an enclosed audio, I’m attaching the same audio to this post, with the added text message that unless my demands are met by next Friday, we’re all going to Valhalla” probably shouldn’t be treated as an already-seen post by a general aggregator, but given the number of podcasts I’ve seen with no real web existence, I bet iTunes sees an awful lot of posts where every item link goes to the same URL, same as the channel link. Maybe that leaves room for ’if <link> is unseen, the post is new, if <link> is seen that’s no information,’ I’m not quite sure.
Hmm, podcast as ransom note. Ten bucks says Dave Winer is the first one to do it. Twenty says the ”or else” is ”or else I’ll shut down my blog. Again.”
Why not just use round-robin DNS among all of the websites to spread the load?
My impression is less of ”a half-dozen formal mirrors rsyncing regularly” and more of ”Bob, have you got enough bandwidth to spare this month to host the latest LugRadio episode?” followed by the panicked ”I thought I did, but now I’m going to go over and start paying through the nose!”
Phil is mostly correct, although we’re not quite *that* shoe-stringy an operation :-)
Some mirrors (the big high-bandwidth ones) host all the episodes; some host only some episodes. Moreover, an episode isn’t at the same URL path on every mirror. So a DNS-based solution won’t work, I’m afraid.
Hrm. That seems to give you fewer options that my cobbled-up view. You really want to have just a single
/episodes/33
URI behind which you can hide whatever load-balancing redirect suits you, then.So, what clients need to behave themselves to amount to a vast majority? Or are there not just a few market leaders in Linux podcatching?
I’m not sure which clients would be needed. I’ve asked people for testing as well; note that it’s not just Linux clients, because there are a lot of Windows and Mac users too. I fear I may need to download and test a load of podcast clients, which I was hoping to avoid…