Nice <gorilla>; what’s he weigh?

Several things are interesting me about IE7β2, but most of all is a relatively minor decision in the feed handling code.

If you start from the RSS 2.0.1-rv6 spec, and follow along carefully, you should notice that the <description> which is a child of <item> is described as allowing escaped HTML, while no other element is so described, and you then should conclude that no other element allows escaped HTML, that the content model for all other elements is plain text, and if someone has a <title> like <title>&amp;eacute;</title> it is because they want their readers to see &eacute;, not because they want their readers to see an e with an acute accent.

If, however, you start with most weblog software or most existing aggregators, you’ll find that you can put links in your weblog’s subhead by editing its description, and that if you want to italicize Odyssey in a title to make it clear you mean the book, not your car or your own road trip, it not only works in your weblog, but probably makes it into your feed, and works in your aggregator, because someone else tried it long ago, and people complained about how bad it looked when the aggregator displayed <I>Odyssey</I>, so the aggregator author thought “why not?” and started treating item and channel titles and channel descriptions as escaped HTML.

Despite the enormous amount of time people have spent arguing over which one of those is “right,” it really doesn’t matter (except perhaps for a couple dozen people like Norm Walsh who will absolutely refuse to have anything to do with escaped markup). What matters is that there only be one way. If you have to create workable feeds for searches of 300,000 browser bug reports, many of which want to display examples of HTML tags in the titles, then it doesn’t matter too much whether you use <title>&lt;foo&gt; support broken</title> or <title>&amp;lt;foo&amp;gt; support broken</title>. However, if using one is clearly correct and to spec, but using the other one is required to avoid breaking some aggregators and opening XSS holes in others, but that then breaks some which were not broken by doing The Right Thing, then you’re likely to wind up saying “screw you guys, I’m going home” and using Atom, which mostly exists for precisely this reason.

While I have as much hope for the spec rewrite that the new version of the RSS Advisory Board is starting on as all the other people who’ve been banging their heads against the syndication brick wall for so long they’re at risk of naming all their children George, that much hope is about… George! you stop teasing George this instant, or you’ll be grounded like George and George!

So I was quite interested to see that IE7, and thus presumably the Windows Feed Platform, have decided to do the right, and hard, thing, and treat all RSS elements other than item/description as the plain text they are supposed to be. Which leaves me wondering: are they really enough of an 800lb. gorilla to convince everyone else to follow along, and make it once again possible to include a less-than character in an RSS title?


Comment by Phil Ringnalda #
2006-02-01 23:32:09