Nice <gorilla>; what’s he weigh?

Several things are interesting me about IE7β2, but most of all is a relatively minor decision in the feed handling code.

If you start from the RSS 2.0.1-rv6 spec, and follow along carefully, you should notice that the <description> which is a child of <item> is described as allowing escaped HTML, while no other element is so described, and you then should conclude that no other element allows escaped HTML, that the content model for all other elements is plain text, and if someone has a <title> like <title>&amp;eacute;</title> it is because they want their readers to see &eacute;, not because they want their readers to see an e with an acute accent.

If, however, you start with most weblog software or most existing aggregators, you’ll find that you can put links in your weblog’s subhead by editing its description, and that if you want to italicize Odyssey in a title to make it clear you mean the book, not your car or your own road trip, it not only works in your weblog, but probably makes it into your feed, and works in your aggregator, because someone else tried it long ago, and people complained about how bad it looked when the aggregator displayed <I>Odyssey</I>, so the aggregator author thought “why not?” and started treating item and channel titles and channel descriptions as escaped HTML.

Despite the enormous amount of time people have spent arguing over which one of those is “right,” it really doesn’t matter (except perhaps for a couple dozen people like Norm Walsh who will absolutely refuse to have anything to do with escaped markup). What matters is that there only be one way. If you have to create workable feeds for searches of 300,000 browser bug reports, many of which want to display examples of HTML tags in the titles, then it doesn’t matter too much whether you use <title>&lt;foo&gt; support broken</title> or <title>&amp;lt;foo&amp;gt; support broken</title>. However, if using one is clearly correct and to spec, but using the other one is required to avoid breaking some aggregators and opening XSS holes in others, but that then breaks some which were not broken by doing The Right Thing, then you’re likely to wind up saying “screw you guys, I’m going home” and using Atom, which mostly exists for precisely this reason.

While I have as much hope for the spec rewrite that the new version of the RSS Advisory Board is starting on as all the other people who’ve been banging their heads against the syndication brick wall for so long they’re at risk of naming all their children George, that much hope is about… George! you stop teasing George this instant, or you’ll be grounded like George and George!

So I was quite interested to see that IE7, and thus presumably the Windows Feed Platform, have decided to do the right, and hard, thing, and treat all RSS elements other than item/description as the plain text they are supposed to be. Which leaves me wondering: are they really enough of an 800lb. gorilla to convince everyone else to follow along, and make it once again possible to include a less-than character in an RSS title?


Comment by Phil Ringnalda #
2006-02-01 23:32:09

Though I’m reminded as I see duplicate posts from the Team RSS Blog that so far they haven’t been a big enough gorilla to shift me: I’ve been annoyed forever by the way that all feeds shift through <link>s to http, then https, then to at least one IP address, reappearing unread to me each time, but not annoyed enough to actually write the code to properly use their unshifting <guid> as a key instead.

Comment by Sameer D\'Costa #
2006-02-02 22:53:47

The recent dev versions of Gregarius try to make good use the unshifting <guid>. And you can specify (on a per-feed basis) whether items you have already marked as read show up as unread when the feed authors modify them.

It does not yet grok atom:updated but you can look out for that in the next version.

Comment by Phil Ringnalda #
2006-02-02 23:21:05

I do need to catch up to svn-head: it’s silly of me to feel behind the times when my browser’s more than 48 hours old, while letting the main thing I display in it get so old.

I could also stop actively making my life worse than it has to be: some time back, I subscribed to both their RSS and Atom feeds, to see whether they both shifted links the same way. Then I forgot, and since they both have the same title, thought I was just seeing the usual duplicates, not a combination of those and my self-inflicted duplicates.

Comment by Ken MacLeod #
2006-02-02 17:01:42

I wonder if they treat RSS/RDF 1.0 item/description as plain text, as it should be. :-)

Comment by Phil Ringnalda #
2006-02-02 21:19:41

Now that would have been impressive, wouldn’t it? Alas, no.

Comment by Mark #
2006-02-03 07:49:11

I wonder if they treat item/description as plain text in RSS 0.90, 0.91(Netscape), or 0.91(Userland). Hell, UFP doesn’t even do that (although if this whole ”following the spec as written” thing actually catches on, I may need to update quite a few test cases in UFP relating to default content types of various elements in the many-splendored thing that is RSS. Atom, of course, doesn’t have these problems, but never mind that.

Also, I wonder if they support HTML 3.2 entities in RSS 0.91(Netscape) feeds. I was surprised (but not really) to see that come up on the rss-public mailing list (yes, I’m watching you Phil). I assume the group won’t actually accomplish anything; they’ll just stumble around like small children running with scissors to catch the short bus until they quietly disband, and we can look forward to doing all of this over again in 2 years. But in the meantime, I regard it as inevitable that the group will eventually try (and fail) to tackle every single one of these issues. (I believe Rogers once called this ”the most well-researched flame on the Internet,” which is about as close to a compliment as I’m likely to get from him.) But no one will actually dare to link to the article itself, and those who haven’t read it (or couldn’t finish it because they were too busy spitting bile at their monitor) will wonder where all these test cases came from, and why everyone seems to know so much about this ragtag collection of esoteric issues, and wouldn’t it be nice if someone could write all of them down in one place, and someone will volunteer to do exactly that, except he’ll want to be all clever-like and do it in OPML, and it will only validate in one of the three OPML validators, and the thread will go off on a long tangent about which validator is ”correct”, which will itself spawn a separate mailing list, blog, wiki, discussion forum, and advisory board to be hosted at, which will fork each existing OPML validator and host it locally and then never update it, so they will slowly go out of sync with each original version, which will lead to a total of six OPML validators and a bitter Google PageRank war and lots of wringing of hands about how Google is favoring OPML validators that are not The One True OPML Validator, which is, of course, Radio.

But I’d be happy to be proven wrong.

Comment by Phil Ringnalda #
2006-02-03 08:46:03

Also, I wonder if they support HTML 3.2 entities in RSS 0.91(Netscape) feeds.

Preliminary two-minute not enough coffee answer: who can tell? With a copy-paste of the 0.91N spec example (which doesn’t actually use any of the declared entities), you get ”An unknown error has occurred.” Remove the DOCTYPE declaration and the error goes away. The DTD is currently where it claims to be, though being served as text-plain, and I don’t know nearly enough about Microsoft’s XML backend to know if that’s likely to be a problem, or what other things might be (beyond just ”they wrote it so it will choke on any DTD” which is certainly entirely possible).

Comment by Phil Ringnalda #
2006-02-03 18:49:42

Hmm. They don’t care about the DTD mime-type, because they don’t attempt to fetch it: they just halt and catch fire at the sight of a DOCTYPE declaration. That is… surprising, and I would think would be a difficult thing to defend.

Which reminds me: way back when they claimed that they would support RSS 0.9x, RSS 1.0, RSS 2.0, Atom 0.3 and Atom 1.0 I asked in a comment whether that was for x≥0 or only for x>0, and didn’t get an answer. Any bets on whether or not they support RSS 0.90?

Comment by Phil Ringnalda #
Comment by Phil Ringnalda #
2006-02-03 19:41:55

Served as text/xml, this isn’t even recognized as a feed. Nor is this feed, from what I’m told is one of the biggest (tech, apparently) news sites in Germany. Lucky for IE that a bobble in Thunderbird’s RSS 0.90 support caused to add an Atom feed that IE7 does parse, because if you don’t render, you hear about it.

Comment by Robert Sayre #
2006-02-03 20:51:05

Yes, you hear about it in very severe and precise German bug reports that outline the severe consequences of failing to fix the bug.

Comment by Phil Ringnalda #
2006-02-03 21:12:21

Severe and precise, yes, but not terribly accurate: as I remember the bugs from Firefox’s turn at failing to handle it, not only were they technically inaccurate, but also it’s always described as ”one of Germany’s biggest news sites.” However, I just looked at the HTML for the first time, to make sure they really had done an Atom feed, and I see that it’s got articles about Yahoo social search, and Nvidia, and Opera 9, and VMware… it’s the bloody German Slashdot, not the German CNN.

Comment by Jacques Distler #
2006-02-03 22:36:06

Were you expecting a flurry of bug reports from CNN readers? Conversely, were you expecting accurate bug reports from Slashdot readers?

Seems to me Heise readers delivered exactly as expected …

Comment by Phil Ringnalda #
2006-02-04 00:05:15

I’m guessing you missed the flurry of screaming and kicking and blocking-flag setting and silly threats last summer over a memory leak on (I mean, sure, I did link to them for George, but only because I’d searched it up to explain to my boss why all his sons are named George.) Even though bugs like ”Support MathML in Chimera Camino” (somebody got himself a fix!) are much more interesting to me, we have Genuine Real Users now, and when MSN does something weird that breaks caching, we get a stream of ”my homepage shows the wrong date for the news box” bugs. Which I never finished testing, so I hope they fixed whatever they were doing. So, yeah, I did expect a flurry of bugs on not parsing an RSS 0.90 feed for a straight news site. What strikes me as odd is that it didn’t ever recognize what it was from looking at the feed, but instantly recognized it from looking at the HTML. Couldn’t see the Slashdot for the angle brackets, I guess.

Comment by Aristotle Pagaltzis #
2006-02-04 01:41:27

Heise publish the two biggest German tech mags, c’t (consumer-oriented) and iX (business-oriented), as well as the German translation of the MIT’s Technology Review and Telepolis, an online essay magazine about science, technology, society, politics, etc. is a newsticker run by the same editorial staff. The mags and the ticker are all absolutely top-notch stuff.

The forums that are attached to the newsticker are 10× as bad as Slashdot.

Comment by Phil Ringnalda #
2006-02-09 00:02:55

Ah, they bail because DTDs are insecure (though it’s under investigation).

Well, it wouldn’t break my heart if they nail 0.91N’s coffin shut.

Comment by Phil Ringnalda #
2006-02-03 22:25:06

I wonder if they treat item/description as plain text in RSS 0.90, 0.91(Netscape), or 0.91(Userland).

And for the final one: no, item/description is HTML in 0.91U.

Comment by Mark #
2006-02-04 11:08:04

item/description is HTML in 0.91U.

Bzzt! Sorry, you are not a winner. Please play again.

Comment by Phil Ringnalda #
2006-02-04 11:47:47

Since I was saying that IE isn’t a winner there, either, let me rephrase that.

item/description is rendered as HTML, rather than the plain text that should be inferred from the spec, when IE7β2-preview sees 0.91U.

Comment by Aristotle Pagaltzis #
2006-02-18 01:04:43

Looks like it will stay dead. Obviously, nothing ever really changed, except the fact that people had stopped bothering.

Comment by Ross #
2006-02-03 12:08:49

Mark’s comment made me spit coffee all over my keyboard – please start blogging again.

Comment by Phil Ringnalda #
2006-02-03 14:24:55

And where, pray tell, would that leave me? By far ”my” most popular and most widely linked ”posts” are Mark’s comments — certainly my most linked ever post must have been Mark’s comment linking to Blogger’s redesigned template screenshots.

Or did you mean that I should start posting more regularly, about the things that I expect Mark wants to comment on? I wonder: do sterile European birds build nests that they think would appeal to cuckoos?

Comment by Pete Prodoehl #
2006-02-07 08:20:12

Mark’s comments made me want to laugh, but then it quickly turned into wanting to cry… Now I just feel despair…

Comment by Mark #
2006-02-10 17:24:37

I’m told I often have that effect on women, too.

Comment by Aristotle Pagaltzis #
2006-02-10 19:48:58

The mark of a great writer.

Comment by Ross #
2006-02-04 03:34:20

Sorry Phil, I didn’t mean any offense, it’s just that Mark’s spot on common sense makes me smile, whilst also riling up all those that would rather push sloppily defined/implemented/conceived ’standards’ as the solution to all of the Comp Sci problems of the last decade (or is that OPML?).

Is Mark Pilgrim the Dennis Leary of the Internet?

Comment by Phil Marham #
2006-02-10 18:36:01

We should really take the time to thank Mark for all he’s done for the community. I think the first contribution was obnoxious tetris. Mark was kind enough to include a copy of MDBP virus with every download!

Comment by Phil Ringnalda #
2006-02-10 19:38:47

OMG!!!1! Mark had a misspent youth? I had no idea. Tell me, was he ever addicted to anything?

You did, however, get the first character of the virus name correct, so points for that.

2006-02-05 08:26:43

[…] phil ringnalda : Nice ; what’s he weigh? […]

Comment by Danny #
2006-02-07 03:56:21

You both wield wonderful weird hybrid nitpicking/pragmatic light sabres, but where Mark’s humour is usually metal-themed Phil’s tends to be more plush. Great double-act, I think Mark in comments works rather well. Perhaps you should do a cable TV show together. Or merge into a single entity (and call yourself George).

Name (required)
E-mail (required - never shown publicly)
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.