Now that’s ironic

Sam suggests that the RSS community discuss RSS practices that, while perfectly valid, are not a good idea, with an eye toward adding warnings to the RSS Validator. It’s a very good idea, but not exactly what I want to talk about.

The RSS 2.0 item for Sam’s post is perfectly valid, correct, best practice RSS: the description element contains plain text, with the characters that XML requires be escaped escaped. The HTML for the item is in an entity-encoded content:encoded element. It’s just not possible to do better RSS than that, and it took me three aggregators and two browsers to read the item.

Because Sam was talking about HTML tags, the item includes samples of those tags. In the content:encoded element, the less-than and greater-than symbols are double-encoded, as &amp;lt; and &amp;gt;, so that when the XML is decoded they will be &lt; and &gt;, which a browser will then display as < and >. In the plain text description, they are single-encoded, as &lt; and &gt;, so that when the XML is decoded they will be actual < and > characters, ready for display by anything displaying plain text.

But here comes the problem: because RSS 0.9x/2.0 allows encoded HTML in the description element (and lots of people put it in RSS 0.9/1.0, too), there is no way on earth for an aggregator to tell the difference between Sam’s plain text with XML encoded < and > in sample HTML tags, and someone else’s HTML with entity encoded < and > around actual HTML tags. I’ve seen quite a few explanations for why people don’t like having HTML in description elements, but I don’t remember having seen any mention of the fact that, for anyone who ever posts sample HTML code, the description element is useless in any aggregator which displays feeds in a browser. Even in an aggregator which displays plain text rather than HTML, the description element will be deceptive, since a plain text aggregator needs to strip out anything which looks like entity-encoded HTML, even if it turns out to have been intended as sample HTML in plain text. Your discussion of the HTML <link> tag will have to be a discussion of the [blank] tag in a plain text reader.

There is a workaround, though it sadly involves ignoring poor description. The content:encoded element contains either entity-encoded or CDATA-escaped HTML: if your XML parser hands you <img src…> then you know it’s intended as HTML, and if it hands you &lt;img src…&gt; then you know it’s sample HTML. Unless we all swear off ever posting sample HTML, the only workable solution to the current ambiguity about single-encoded &lt; and &gt; in description is for all RSS producers (or at least all who might ever post sample HTML) to include a content:encoded element, and for all aggregators to use content:encoded rather than description whenever it’s present. A plain text aggregator should be able to get away with just checking for &lt; in the description, and only turning to stripping HTML from content:encoded if it’s present (an approach which isn’t likely to please a developer who only wants plain text in the first place, I’m afraid).

Luckily, many desktop aggregators already do support content:encoded. That’s where my luck ran out, though. Somehow (probably my own fault), I broke my copy of Radio’s support for content:encoded, so I got Sam’s item description, with the unclosed sample <a href…> interpreted rather than displayed. I tried AmphetaDesk, but although I think the version currently in CVS supports content:encoded, the currently released version does not, so it also displayed a clobbered description. On to Aggie, which as far as I know was the first one to support content:encoded in a release version. While Aggie did display the content:encoded element rather than description, Sam’s perfectly correct and valid use of <code> around his code samples caused Aggie’s default template, with its cool “display in a tiny font, and enlarge it onclick” feature, to overlap words while switching between the monospaced code font and the proportional font for the rest of the item. On to Aggie’s alternate template, which displays just the post title, with an arrow image to click to display the body of the item. Which, sadly, doesn’t seem to function in Phoenix. So, I started Internet Explorer for the first time in three days, and after three applications, a template change, and two browsers, finally was able to read Sam’s perfectly valid item.

13 Comments

Comment by Anonymous #
2002-10-27 17:45:35

Just a note that when Aggie RC5 is released it will have an option to disable the ”dynamic text resizing”. It also has a couple of fixes to the CSS that fix many of the overlapping problems.

 
Comment by joe #
2002-10-27 17:47:02

Sorry, didn’t intend for that last comment to be anonymous.

 
Comment by Phil Ringnalda #
2002-10-28 10:59:33

Re: tima’s TrackBack: it doesn’t matter (in this case, anyway) whether you entity encode or CDATA escape: what matters is that when your parser hands you a description element with a <b> you have no way of knowing whether it is a plain text description with example HTML or an HTML description with markup.

 
Comment by Ziv Caspi #
2002-10-29 01:13:17

Phil, the problem is that the description tag has no way of saying what’s its type is. In theory, if people could be convinced to put text in description fields and HTML in content:encoded fields everything would be simple. In reality this (still) doesn’t happen, because the authoring tools people use don’t offer this capability.

 
Comment by Morten Frederiksen #
2002-10-29 02:38:00

Ziv,

Yes, that’s the ideal solution, and with the spreading use of content:encoded, it might just happen in the long run.

It seems all we have to do is wait for the majority of producer systems do it this way, but there will always be homegrown systems with escaping issues…

 
Comment by Phil Ringnalda #
2002-10-30 08:14:50

If we could turn back the clock, and convince Dave to add content:encoded to RSS 0.92, rather than allowing HTML in description, then it would be an ideal solution. Since we can’t, aggregators have to assume that &lt;b&gt; in description means that someone wants bold, not that someone is talking about the bold tag. If they are displaying plain text, they have to strip it, not display it. That leaves people who talk about HTML tags with only two choices: join the majority and put entity encoded HTML in their description whether or not they want to, or do what Sam’s going to do, and I’m going to try to remember to do: do a separate, actual description (an excerpt, in MT) in plain text that only names the element, rather than attempting to display the tag.

If they choose to, the RSS 1.0 folks could approve content:encoded, rather than leaving it in ”proposed” limbo, and clarify that description is plain text only, that aggregators MUST NOT interpret HTML in a 1.0 description, but it’s such a political space that I’m not very excited about trying to get that ball rolling myself.

 
Trackback by Sam Ruby #
2002-10-27 04:53:02

Content encoded

Phil Ringnaldamakes some excellent observations on encoded HTML in RSS feeds. What is needed is a glyphwhich visually looks like less than and greater than signs but with none of the semantics. I’m tempted to use single guillemetcharacters.‹fo

 
Trackback by revjim.net #
2002-10-27 05:31:33

content-type for RSS elements

Phil Ringnalda talks about the problem of HTML data being present in an RSS feed. He suggests modifying the way aggregators work and the way publishers publish in order to solve this problem. ”The only workable solution […] is for all RSS producers […

 
Trackback by Pet Rock Star #
2002-10-27 06:03:06

My Problem With RSS

Now y’all know I love Phil Ringnalda, but he’s finally spit out the perfect quotable statement on why all the hoopla surrounding RSS doesn’t apply to this particular user, and probably never will.

 
Trackback by Justin's Journal #
2002-10-27 11:36:12

phil ringnalda dot com: Now that’s ironic

Now this is exactly why I’ve stayed out of the RSS wars. I publish an RSS 0.91 format version of this journal but it is basically just a summary with enough of each article to give folks an idea of what I’m talking about and to let them see whether or …

 
Trackback by Burningbird #
2002-10-28 08:57:41

Changes to RSS Validator MT Templates

If you’re a Movable Type user and are using the templates provided with the new RSS Validator, be aware of the following line: <content:encoded><![CDATA[<$MTEntryBody$>]]></content:encoded> This will include your entire post wi…

 
Trackback by tima thinking outloud. #
2002-10-28 10:41:50

The Problem with Entity Encoding HTML Illustrated.

Phil Ringnalda writes about the irony in Sam Ruby’s post about valid RSS and the problems with entity encoded HTML. I’ve been quite vocal that entity encoded HTML is a bad idea that we need to get away from. Shelley Powers points out an unexpected surp…

 
2002-12-19 10:06:52

Changes to RSS Validator MT Templates

If you’re a Movable Type user and are using the templates provided with the new RSS Validator, be aware of the following line: <content:encoded><![CDATA[<$MTEntryBody$>]]></content:encoded> This will include your entire post wi…

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.