Again with the relative URLs

Once again, Scott Andrew is unhappy about the “requirement”† in RSS that you use absolute URLs ( rather than /main/contact/). He asks:

why can’t RSS readers resolve relative URLs like browsers do? I am after all required to include a base URL in the LINK element of my RSS feed.

However, the <channel> <link> element is not at all required to be the base URL for relative links. It isn’t defined as a base URL, just as “The URL to which an HTML rendering of the channel title will link, commonly the parent site’s home or news page.” (RSS 1.0) or “The URL to the HTML website corresponding to the channel.” (RSS 2.0). In a weblog, typically the link will be usable as a base URL, but unless both specs make it clear that it must be, some people using relative URLs will assume that readers will do the right thing, and resolve relative URLs as relative to the URL where the page was retrieved, meaning that images/one.jpg should be resolved as /xml/images/one.jpg, not /images/one.jpg.

The problem gets worse with non-weblog feeds, though. There’s no guarantee that there is an HTML page where all the items in a feed appear, so it may not be possible to give a single URL in the link which is the base URL: an “all our news” feed might include items from a half-dozen subdomains, a couple of different domains, and include items (“this RSS feed is changing to support <xhtml:body>, if your reader doesn’t support that switch to this other URL”) which don’t appear in an HTML page anywhere. In fact, some feeds (picture something like a feed of server statistics, or baseball scores) don’t have an associated HTML page at all.

It would be technically possible for both specs to define how relative URLs should be formed and resolved, but it simply isn’t going to happen: RSS 2.0 is frozen, and RSS 1.0 is, ah, stalled, shall we say, and there are so many people associated with it who think that the <description> element should never ever be anything but plain text that I can guarantee that any proposal for a change in the spec to define how relative URLs should be resolve would devolve into yet another argument about that, instead.

Fortunately, for his problem, we don’t need to change two specs and the behavior of every RSS reader in existence. Thanks to my tireless harrassment of programmers much sharper than me the last time he had a problem with relative URLs and RSS (the trail led through Sam Ruby to David Rayners and finally to Alexei Kosut), now there’s MTResolveURLs, a plugin that adds a resolve_urls global filter which can be applied to any MT tag to resolve any relative URLs using your MT site URL as a base. Drop it in your plugins directory, add resolve_urls="1" to any tags in your RSS templates that might include relative URLs, and move on to more important battles.

† Strictly speaking, it should be a warning rather than an error, since neither fork of the specification actually mentions anything either way about URLs in <description> or <content:encoded>


Comment by DJ #
2003-04-04 07:31:49

You’re right. Because of the many situations (some of which you mention) where there’s no automatic way to determine the base URL of relative links in an item, it doesn’t make sense to try and fix this at the consumer end.

Anyway, what I wanted to mention was that there’s also a plugin ’absolute’ which does the same sort of thing for Blosxom.

Trackback by Sam Ruby #
2003-04-04 05:45:30

Again with the relative URLs

Phil is right. The specs are silent on this. In fact, many people believe that relative links should be resolved relative to the feed itself, not the <channel><link> element’s value. Spec issues aside, the hope that aggregators can corr

Comment by DG #
2006-11-23 02:03:16

Totally agreed.

I do notice that browser based readers, notably Sage, etc. work properly with relative URLs -> probably because they’re using the browsers engine to render.

Good points, though.

Name (required)
E-mail (required - never shown publicly)
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.