What feeds should Firefox discover?

While wishing yet again that there was some better place to discuss what Firefox’s feed autodiscovery should find, I remembered: I’ve got a blog!

The two things that are correct, rel="alternate" with either type="application/rss+xml" or type="application/atom+xml" are the easy part. Other than having used .match when I meant .test, the patch I abandoned last fall seems to cover them exactly to spec. I burn with the desire to shove it through, shouting down any objections, but I know what Mark Pilgrim would call me as a result.

Then there’s all the exceptions.

WordPress defaults to having three feeds, full-content RSS and full-content Atom with to-spec links, and an RSS 0.92 feed with just a summary, perfect for our titles-and-links-only aggregator, that’s linked with <link rel="alternate" type="text/xml" title="RSS .92" href="/feed/rss/" />. Also, there’s all the people who believe they shouldn’t be using the unregistered and probably unregisterable type application/rss+xml, who may be using text/xml or the more sensible application/xml. With application/rss+xml refering to at least 9 different and incompatible formats, with two different root elements, two different namespaces plus a load of unnamespaced elements with shifting meaning and content models, good luck getting it registered, and although the HTML DTD isn’t able to express the restriction, so the validator won’t catch it, the spec makes it quite clear that type can only be either a registered mime-type, or an experimental type using the proper application/x-rss format. People using application/xml are wrong relative to a spec which only exists as a weblog post, but they are right relative to the HTML spec.

Then, for a short while last December while the literalists held sway, the feeds on mozilla.org’s home page were using type="application/rdf+xml" for RSS 1.0 feeds, and since they don’t use “RSS” in the title, even our current ultra-liberal autodiscovery wasn’t finding them. While they no longer use it, Ian Davis does, and there are surely others who do as well. There’s no spec for it, and I’ve yet to hear any persuasive ideas about how we should tell alternates in application/rdf+xml which are RSS from all the other possible RDF alternates. Mandating the string “RSS” in the title attribute is the only real suggestion I’ve heard, which makes that one specless (and likely to remain specless other than perhaps another blog-post-as-spec, given RSS-DEV‘s moribund state) form of autodiscovery utterly unlike the other two, which explicitly say the title isn’t significant (Atom), or that it may be anything and is thus not significant (RSS).

There is precedent for overloading title with machine-significant things: that’s how preferred and alternate stylesheets work, and also how FOAF autodiscovery works. However, neither one is a good analogy.

For stylesheets, there’s just the one place to look: anyone putting a link in their HTML to refer to their stylesheets knows that the explanation for how that link will work can be found in the HTML spec. For RDF, even if an autodiscovery spec is published as part of the RSS 1.0 spec, there’s no reason to think that anyone doing alternate RDF that isn’t RSS would first check the RSS 1.0 spec, to see what letters they are and aren’t allowed to use in their title.

For FOAF, the penalty for being wrong is completely different. If you link to some random non-FOAF RDF with type="application/rdf+xml" and a FOAF scutter happens to try to read it, no harm done: it’s RDF, being read by an RDF parser, and if it doesn’t find any FOAF statements, well, it won’t be loading it all that often. However, if you structure RSS autodiscovery so that anything of type="application/rdf+xml" may be read by RSS parsers, you’ve got a very big problem. Nearly all of them are not RDF parsers, so they’ll have absolutely no idea what you are saying if you aren’t saying it in RSS 1.0’s restricted syntax and grammar, but that’s not going to stop them. They’ll keep reading it, and reading it, and reading it once an hour, until the internet shuts down or the user switches to a new reader that doesn’t import subscriptions unless it can read them right away.

Far and away the most likely outcome of saying that autodiscovery for RSS 1.0 should use <link rel="alternate" type="application/rdf+xml" title=".* RSS .*" href="/foo" /> is that it will prevent the use of link for any other alternate RDF of any sort, which isn’t something I want to be a part of doing.

So, what do you think? Beyond the two correct, to-spec sorts, what should Firefox autodiscover?

25 Comments

Comment by Roger Benningfield #
2005-04-15 23:52:20

When I do autodiscovery, I look for application/rdf+xml, and then do a basic XPath query to see if whatever I’m discovering looks like a feed. Is there any reason that Firefox couldn’t do the same thing?

Comment by Phil Ringnalda #
2005-04-16 00:05:15

You’re doing it when you’re directed to: we’re doing it when the DOMLinkAdded event fires as the parser sees the link element while loading an HTML page. Do you want to load Ian’s RSS feed every time you load his weblog? Probably blocking the page load (I don’t really know, but I don’t think we have the architecture to load it in the background while the page continues loading)? Would someone with a non-RSS RDF alternate be pleased to have every Firefox user load their RDF on every load? They’d probably be better off with a few foolish moths beating their heads against the light because they subscribed to a Live Bookmark that isn’t ever going to work because it isn’t RSS, rather than have every Firefox user constantly sniffing at their RDF.

Comment by Aristotle Pagaltzis #
2005-04-17 02:30:15

How about checking when the user requests to subscribe to the feed, so an error pops up instead of the live boomark dialog? “I’m sorry, Dave.”

Comment by Phil Ringnalda #
2005-04-17 09:39:44

I suspect that would wind up being awfully annoying.

”Firefox has discovered feeds for this page!”
”Oops, never mind, not a feed after all.”
”Firefox has discovered feeds for this page!”
”Oops, it’s kinda broken right now, so I’ll refuse to let you save a bookmark until it’s fixed.”

The only thing that’s really annoying to me about using Feed on Feeds (which uses a strict parser more because that’s all that’s available in PHP than for philosophical reasons) is the refusal to subscribe when someone’s temporarily screwed up their feed. If I have to remember to come back and keep trying to add a feed after a day or two when they’ve fixed it, or a week or two after they’ve driven the invalid post out of their feed, I just won’t.

And even if we only sniff for either ”rss” or ”RDF” plus the namespace URL for RSS 1.0 somewhere in the first 1KB or so of the potential feed, there are still hundreds of possible ways to temporarily screw things up so we would refuse to allow Live Bookmarking of an actual feed, while still subscribing to things which aren’t feeds. That’s why I think it’s better to only offer to create a Live Bookmark when it’s correctly advertised, and then create it no matter what: if they said it was a feed, then they are responsible for having a feed there and usable, and we’ll just keep reloading it (and reloading it (and reloading it)) until it is. If we do loose and sloppy parsing of links, then to not be jerks who keep reloading a static RDF-not-RSS file over and over, we have to be ultra-strict about what we actually add.

 
 
 
 
Comment by Mark #
2005-04-16 10:11:56

Mu.

Comment by Phil Ringnalda #
2005-04-16 10:36:26

Sadly true, with a dash of Schrödinger’s cat.

So tell me, what does it mean when I click on Sean’s footnoted links, and am told ”The offer you have clicked has expired. Revisit our site to get the latest specials and updates. Thanks.”?

Nevermind, I know. Mu.

Meh.

 
 
Comment by børge #
2005-04-16 13:55:11

Firefox dicovers feeds on my site that I have marked with rel="newsfeed" instead of rel="alternate". Is this wrong, should I change it to alternate?

Comment by Phil Ringnalda #
2005-04-16 14:22:51

Yes.

Or, at least, probably. There’s no particular reason to believe that any other program will discover them, since (as far as I know) rel="newsfeed" has never had even a blog post spec written for it. It might be a good idea, particularly for the application/rdf+xml problem, but then any discussion of another value for rel will quickly wander down the rabbit hole of profile, so it would need to be someone with a fair amount of authority simply saying ”this is how it is, no negotiation, shut up and soldier.” Offhand, I can’t think of anyone who is both in that position, and has any interest in RSS 1.0 and its discovery.

 
 
Comment by Tom #
2005-04-17 15:17:15

A little bit off topic but anyway:
When I try to open pages with mime application/rss+xml, firefox tries to download the files instead of showing them. Is there a setting to show these files as xml (like application/xml)? (programmatic or in the option dialog)

Comment by Phil Ringnalda #
2005-04-17 20:14:50

Nothing easy that I know of, anyway. You could write a helper application (one per platform) which would then be passed the filename for a local copy, which you could then re-launch as whatever type your OS uses for .xml (though I wouldn’t be surprised if you ran into encoding problems doing that) by just calling firefox tempfilename.xml, or more reasonably write a content handler (the only place I can think of offhand to steal code is MAF, though there must be other extensions that include a content handler) that will then (/me waves hands wildly while slowly edging out of sight).

 
 
Comment by Mike Mariano #
2005-04-17 22:10:27

Perhaps the solution is to expand Live Bookmarks so that any possible use of RDF would make a valid bookmark. Have a linked FOAF file? A Live Bookmark will fan out to display a list of all contacts. Just because this information may not really be ”live” doesn’t mean it can’t be bookmarked!

I say this in jest, but I half expect to get an increasingly familiar five-word reply

Comment by Phil Ringnalda #
2005-04-17 22:40:58

Oh, extensions are so last month. There’s a Greasemonkey script for (more or less) that.

Running FOAF through Live Bookmarks would need a dependency on either per-feed refresh timing, or a general trend toward much more dynamic FOAF. Reading mine once an hour wouldn’t buy you much, since I appear to have last changed it in 2002. Gotta get to work on making more new friends.

(Extensions? How about piggybank and the charmingly named pigsty, which makes a bookmarks folder of all photo galleries it finds linked in FOAF.)

 
 
Comment by Lachlan Hunt #
2005-04-18 16:01:10

This is exacly why we need rel=”feed” to be standardised, and ditch the current broken autodiscovery system that abuses rel=”alternate”. Anyway, since we’ll be stuck with rel=”alternate” for a while now, (ab)using the title attribute by detecting if it contains ”RSS” is probably your besat option, though it might be worthwhile checking for ”Feed” in the title also.

Comment by Phil Ringnalda #
2005-04-18 21:02:16

So, roughly what I’ve been thinking of as the ”choose two” alternative? Anything which has any two of the three possibilities, an unknown rel with a known type and ”RSS” in the title, or a known rel in an unknown type with ”RSS” in the title, or the proper known rel and known type with any title? I guess it’s better than our current ”anything which has ’RSS’ or ’Atom’ in the title is a feed,” though I could still write real-world testcases it would fail all day long.

But, standardised? We have a blog post standard, and an expired Internet Draft. We have a frozen spec, a moribund spec with a group ignoring its own process rules, and a spec-to-be that is so uninterested in autodiscovery that I don’t think it knows that all it has is an expired I-D.

Who’s going to drive this standardization, and where? I’m probably forgetting other objections (I usually do, anymore, along with my keys), but the only serious counter to rel="alternate" I can remember offhand is fantasai saying that we shouldn’t use any rel at all. I had trouble getting too excited about that, since saying that rel="alternate" is so holy that it’s better to rely entirely on an unregistered and thus invalid mime-type didn’t strike me as the start of a successful markup-religion campaign.

 
 
Comment by Sam Ruby #
2005-04-19 04:46:18

Phil, you’ve probably expended more energy in this blog post than it would take to write a Pace.

Comment by Phil Ringnalda #
2005-04-19 07:53:06

Pace What?

Pace Everyone Concerned About Mime Types Must Use Atom?

Pace Whoever Dropped The Atom Autodiscovery Internet Draft Ball Should Pick It Up?

Pace Rss 1.0 Should Either Die Or Come Back To Life?

Important as I think that designing the future is, that’s not what I’m trying to do here. I’m trying to figure out how to live in the messy present, how to pick up as many strangely non-standard links as possible without autodiscovering the next HTML chapter in the Atom spec or someone’s RSS book.

Comment by Sam Ruby #
2005-04-19 19:40:59

If you dont know where you are going, any road will get you there.

Figure out what you would like best, work with every group that you can to make it happen, meanwhile provide whatever fall back behavior you feel comfortable with.

The Atom spec has a reasonable chance of being published as a standard. You can influence it. For extra bonus points, you can even push it in a direction that can be used to provide guidance for users of one or more of the various versions of RSS floating out there.

Comment by Phil Ringnalda #
2005-04-19 21:50:19

Having gone through seven draft comments, each of which amused me but none of which pleased me, I’m down to this:

The only action which is possible and which I desire is to have Mark’s expired I-D for Atom autodiscovery make it through to whatever its completed, approved state is supposed to be. I know nothing about the mechanics or politics of the IETF, so I am not the right person to drive it.

Everything else, the whole point of this post and that bug, is to ask the question ”what fallback behavior do you want; what incorrect autodiscovery do you still want to accept; what non-feeds are you willing to have autodiscovered to get that?” My answer is simple: none. We have specs, such as they are, and the only objections to them I hear are religious, not technical. However, our fallback for things not discovered isn’t very good, so I’m open to suggestions. What feeds should Firefox discover? That’s all I want to know.

 
 
 
 
Comment by Roger Benningfield #
2005-04-19 10:28:04

Phil: Okay, I’ve been thinking about this… how ’bout a warning in the New Live Bookmark dialog whenever an application/rdf+xml file is selected, along with a ”Verify this is a feed” button?

Then there’s my preferred route. We could all just drop support for application/rdf+xml autodiscovery entirely. After all, there’s nothing stopping purists (I think) from adding two links to the same document… one described as application/rss+xml, the other application/rdf+xml. Apps that actually want RDF will be able to find it, and those that just want a feed can get it.

Comment by Phil Ringnalda #
2005-04-19 20:14:51

I’m so down with your preferred ”just get over yourselves” alternative :)

The other option, not just for application/rdf+xml but for everything that’s been discovered from something a bit off (text/xml, .*/.?+?xml, ”RSS” in the title, whatever), sounds pretty good to me, too, but I suspect it’s beyond my abilities. If it’s the best we can do, I can always throw the idea in the bug and walk away, but I don’t really want to have to do that.

 
 
Comment by Ian Davis #
2005-04-22 07:36:56

Coincidentely I changed my link tags to application/rss+xml but I’m still serving the files up with application/rdf+xml which irritates me. I wish that there was a better way.

HTML allows multiple rel values, so using rel=”alternate syndication” or rel=”alternate feed” would be a possibility.

Comment by Phil Ringnalda #
2005-04-26 00:29:59

Interesting! My knee-jerk reaction is that rel values should be independent, that <link rel="alternate syndication" type="application/rdf+xml" href="/foo"> should be exactly equivalent to <link rel="alternate" type="application/rdf+xml" href="/foo"> and then <link rel="syndication" type="application/rdf+xml" href="/foo">, but I don’t actually see anything to support that (or any other) interpretation in the spec.

 
 
Comment by mardoen #
2005-04-23 03:05:25

I don’t see the problem. why don’t you let the user decide? find a way to present a list of feeds that were found that includes the following information per feed:
– feed url (user should be able to copy this to clipboard)
– feed type
– can it be parsed by firefox?
– has an application registered itself to handle this kind of feed?

and a list of actions:
– bookmark feed (only shown if it can be parsed by ffx)
– send feed url to application x (choose from a selector if there is more than one)

…and then autodiscover as many types of feeds as possible.

Comment by Phil Ringnalda #
2005-04-25 23:34:48

Well, the problem is that <link rel="alternate" type="text/xml" href="/feeds/rss/" title="RSS 0.92"> is an RSS feed we very much want to discover, and <link rel="alternate" type="text/xml" href="/historyofrss/rss092/" title="RSS 0.92"> is some arbitrary XML format of a chapter in someone’s book that we very much don’t want to discover.

But, can it be parsed by firefox? There’s only one way to find out, by parsing it. Roughly a third of my hits are for HTML, around 60% using Firefox, and my HTML averages links to 2.5 feeds (extremely roughly). From yesterday’s stats, that would be 10000 extra requests for RSS and Atom feeds, on the off chance that someone would want to add one as a Live Bookmark. If, as they don’t but others with similar pages do, Reuters supported autodiscovery for their page o’ RSS feeds, how many wasted requests would those 17 links produce per day? How about the individual links in each section? How about just Yahoo!’s Most Emailed page?

Practically everything I regularly visit except Bugzilla has a linked feed or two or three: how much more painful and slow should my dialup existence be, to cater to people who think they have some reason for doing something other than following a very simple pair of rules about the value of two attributes? How much extra bandwidth should people be required to pay for if they want to support autodiscovery? And how many people would look at a few thousand or million extra hits to their RSS feeds and decide that supporting autodiscovery wasn’t really worth it?

Attempting to parse a feed at the time it’s being added can be reasonable (sometimes, though it then tempts most programmers to refuse to add it if they can’t parse it at that precise moment, which is generally a bad idea), but attempting to parse everything that might be a feed just to see is perfect for a Rabid Feed Terrier extension, and just not right for Firefox itself.

Comment by mardoen #
2005-04-26 03:37:57

still don’t see the problem. add a button ”check if this feed can be understood by firefox”, which parses it and if applicable proposes to add a bookmark.

it’s really only a UI problem, not a technical one.

 
 
 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.