phil ringnalda : My brother’s (feed’s) keeper

My brother’s (feed’s) keeper

One thing I didn’t expect, though perhaps I should have, about moving from Bloglines to Feed on Feeds, was the way that I am now responsible for making sure that the feeds I read are well-formed, usable XML. Bloglines is apparently fairly liberal in its parsing, and even when it isn’t, it doesn’t exactly put feed errors right in your face: in moments of extreme boredom or extreme avoidance behavior, I would sometimes scroll clear through my hundreds of subscriptions looking for the red “[!]” that it uses to indicate a problem with a feed, but rarely, rarely. With Feed on Feeds, it’s just a click (or two) on the “ageâ†‘” column header to sort them with the ones that were least-recently workable first. And then, since it seems that nobody else is noticing, I have to either rattle some cages, or wait for the offending post to fall out the bottom of the feed. At first, I tried emailing some people, but either email really is broken, or nobody reads email from me anymore. I was going to switch to leaving comments, at least where they are available, but then I noticed that my infatuation with short posts has left me in danger of having my sidebar wrap around underneath my suddenly shrunken content column, so instead I’ll just drop a few here. You know you love reading this stuff, anyway, right?

Le Â«blog personnelÂ» de Joe Clark Â» â†? Office Â¶ Kindergarten â†’: This will become an all too common theme in this, and no doubt subsequent, lists: Joe’s title contains HTML entities (or, if you prefer formality, character entity references from the HTMLSymbol set), which are perfectly at home in HTML, but are undefined in the XML of his feed, and thus need to either be escaped as &larr; and &rarr;, or need to be the literal character rather than an entity reference. The more I learn about character encoding, and the more problems I see in feeds, the more I like literal characters (properly escaping them would let me see the post in Feed on Feeds, but because I haven’t gotten around to hacking in support for unescaping them, I’d see the literal → rather than an arrow). In Movable Type it’s a right royal pain, but I quite often type an entity, and then the next time I’m looking at the preview in its annoying separate page, I copy the character and paste it back into my title or entry. In WordPress, it’s a quick scroll down to the preview at the bottom of the posting page and back up. Joe’s using WordPress 1.2-delta, but my copy of 1.5-alpha-6 suffers from the same lack of title-escaping, so chances are 1.5-final does as well.
World’s first? Wimax for train commuters: Dunstan mentions that Britain’s mobile companies paid 22.5 billion of something for licenses, but the currency unit is in question, or rather is a question mark. That’s not too uncommon with the euro symbol, since it’s not defined in ISO-8859-1, and pasting one into a form in an ISO-8859-1 page will tell your browser that you want to silently submit the form in Windows-1252 instead. Win-1252 uses 0x80 as the codepoint for the euro, while that’s an undefined character in ISO-8859-1. Since Dunstan publishes in UTF-8, the one true way to avoid that sort of silent recoding, I’m not sure how he managed an undefined currency symbol, but luckily his blog is powered by Naked Dunstan Technologies, so it’s his problem alone :)
J-Walk Blog: Cocktail Generator: Not my favorite problem to explain, or to suggest how to fix. J-Walk’s HTML is interpreted as being encoded in ISO-8859-1, thanks to a <meta> tag (though the HTTP standard would also say it was, as well, since that’s the default for all text/* media types, including text/html). However, his RSS is delivered with a Content-Type: application/xml header without a charset parameter, and the XML declaration in the file itself doesn’t specify an encoding, so thanks to the insanely complicated rules for XML, that means his feed is interpreted as being encoded in UTF-8, whether or not it really is. Sometimes that’s just fine, and other times you paste in a Â½ thinking you’ll get a “vulgar fraction one half” (why are those called vulgar, by the way?), and instead you’ll get an invalid, unparseable feed. Ideally, the fix would involve hacking whatever pMachine uses to generate the feed, to add ; charset=iso-8859-1 at the end of the content-type header, and also changing the XML declaration in the feed itself to read <?xml version="1.0" encoding="iso-8859-1"?> so it will retain the knowledge of what it is no matter where it travels.
rawbrick.net â€º articles â€º 5 â€œloveâ€? songs: This one sucks particularly badly because Carol’s RSS feed is just fine: the two entities in the title, “ and ”, are both amp-escaped there. However, I’m subscribed to her Atom feed, where they are left unescaped, and thus are undefined entities. Judging by the URL, Textpattern g1.17 could use a little help learning about how to strip entities for filenaming, as well as learning how to escape them in Atom.
Half a Six (Apart) Pack: During Movable Type and Six Apart’s recent redesign, I think I read in one post that they were going to redirect the Six Log feed to a combined feed of everything they post everywhere, though I’m not sure because I was being flooded with posts due to their redirection of all the feeds to Feedburner, causing every <link> to change, and every item to become new again. Looking at what’s in the feed right now, it appears that maybe it is an aggregated feed, with a two-day delay on posts from the Pronet Weblog. My vague memory is that one of the benefits of Pronet membership was supposed to be earlier access to posts there, but the URL where I was subscribed to it directly is now 404, as is my subscription to the Movable Type news feed and the MT Plugins feed. Of course, since they’ve outsourced all their feeds, the exact same post in the two different feeds would have a completely different <link>, and thus look new again to anything not looking at the <atom:id>, or the element I never noticed before and now think I shall abuse, <feedburner:origLink>. In any case, my ideas about what I’m supposed to do about the three feeds I was subscribed to that are now returning a 404 are now as muddled as this post, and as my ideas about why they needed to outsource their feeds (are they incapable of producing usable feeds, despite writing software thats supposed to? are they incapable of handling the bandwidth, despite running a hosting service? are they so desperate to know about click-through numbers that they’ll annoy their subscribers, and lose some along the way? puzzling.). Probably I’ll just take the easy way out, and unsubscribe. Oops.

This entry was posted on Friday, February 18th, 2005 at 10:42 pm and is filed under feeds and syndication. You can follow any responses to this entry through the post feed. You can skip to the end and leave a response. Pinging is currently not allowed.

35 Comments

Comment by James #

2005-02-18 23:49:16

It looks like that bug in Textpattern’s Atom code is still around as of 1.0RC1. I’ll file a ticket.

Reply to this comment

Comment by Phil Ringnalda #

2005-02-19 17:59:09