RSS 1.0 content:encoded
Probably my biggest gripe about RSS 1.0 for weblog syndication is the way it discourages feed providers from adding the entire post, complete with HTML, to their feed. The content module has been around forever, but it’s chock-full of RDF goodness (which means that it’s nearly incomprehensible to the average weblogger), and doesn’t seem to be supported by any reader/aggregator I know about. As a result, RSS 1.0 feeds typically consist of an HTML-free excerpt that does a terrible job of describing the post: whether or not blogging is journalism, we very rarely cram a complete introduction to the post into the first twenty words, much less make sure it makes sense without any HTML. Take a post that’s three words, one a link, followed by a blockquote, and strip the HTML, and you’ve lost the sense of the post even before you cut it down to twenty words.
Now there’s a solution: thanks to Aaron Swartz, tireless promoter of content in RSS 1.0 (his proposal for the ancestor of the content module, a weblogs module, was the third message on RSS-DEV, the RSS 1.0 mailing list), there’s now a (proposed) element in the content module, <content:encoded>, which is quite simply the content of an item, either entity-encoded or escaped in a CDATA section. Practically speaking, for Movable Type users, that’s an added namespace declaration of:
xmlns:content=”http://purl.org/rss/1.0/modules/content/”
and then either:
<content:encoded><MTEntryBody encode_xml=”1″></content:encoded>
or
<content:encoded><![CDATA[<MTEntryBody>]]></content:encoded>
following the <description> element in the <item> part of the template.
Of course, without a reader that will do something with it, <content:encoded> is even less useful than the rest of the intricate and incomprehensible tangle of RSS 1.0 modules, since it isn’t complicated enough to give you any geek cred for including it in your feed. Fortunately, Aggie 1.0 RC4 supports <content:encoded>, yet another reason why I’m using Aggie more these days. For readers that don’t support <content:encoded>, nothing changes: they still see and use your <description>, and just ignore the <content:encoded>.
Forgive me for asking such an obvious question (I’m a complete RSS newbie) but is there any reason this problem can’t be solved just by encoding the HTML – changing less-than and greater-than signs and double quotes into their relevant entities?
In RSS 0.92+, you can and should do exactly that: it’s in the spec, and totally expected. However, in RSS 1.0, <description> is supposed to be 0.9-compatible, and 0.9 (and 0.91) didn’t allow any HTML. You can sort of skirt around that by claiming that entity-encoded markup isn’t markup, but then a well-behaved reader ought to display your entry as ”This <i>isn't</i> very cool.”
Worse yet, last time I checked Radio Userland (probably the most popular RSS reader for blogs, if not in general) was foolishly double-decoding entities in description, so that if you post sample code, carefully entity-encoding it, and then properly entity encode the &s in your RSS feed, Radio will merrily decode it twice, making your sample code execute. If you happen to post the wrong sort of code, you can leave Radio users with no choice but to unsubscribe from your feed to resurrect their aggregator.
For more detail about why you shouldn’t use entity-encoded markup in <description>, see Use Of Encoding Within Description Elements Considered Harmful.
You should take a peek into this thread on blogroots, if you haven’t already. They started out discussing the good vs. evil when it comes to putting your whole post into an RSS feed.