Who knows a <title> from a hole in the ground?

Results from a few testcases of various forms of escaping an Atom title which is not markup. Not exactly overwhelmingly impressive. In every case (unless I’ve screwed up) the title should be displayed as <title>.

Aggregator HTML CDATA HTML entity HTML NCR Text CDATA Text entity Text NCR XHTML entity XHTML NCR
Bloglines
My Yahoo!
Newsgator Online
Google Reader
Netvibes
Gregarius
Thunderbird
Firefox
Opera
RSS Bandit
Sharpreader

Some notes:

If you are using Internet Explorer Windows, all those results may well look like identical empty squares, instead of checkmarks for pass and Xes for fail. May I recommend some fonts and a better browser that will make use of them? Or another better browser? That’s gotta be better than digging in the source for the classnames.

Bloglines has one “?” because I failed to subscribe to one feed when I started this last night, and they apparently take eight or ten or twelve hours before they first fetch a feed. Maybe the new datacenter will improve things. Passed, when it finally showed up.

Rojo doesn’t appear because they either take more than 24 hours to first fetch a feed, or they don’t support Atom 1.0, and don’t support it by claiming that it hasn’t yet been fetched. Can’t tell, really.

Kinja doesn’t appear because it says that it’s down for “routine maintenance” and has been all day. Has it been for longer than all day? Is it gone forever? Don’t ask me.

“Windows” Live fails to support Atom 1.0 in a most unamusing way: it pretends that nothing happened. Import the OPML file of test cases, and you get a blank content-hole for a second, then you’re back where you started. Tell it to subscribe to a particular URL, and it starts talking about the results of your search for words in the URL.

My Yahoo! found the most amusing way to fail: it would have been fine on the three HTML tests, except that it fails to realize that > only matters when it follows ]] — I don’t bother escaping it otherwise, and so they stripped it, while allowing through the <title. It also gets points for being eager: despite the fact that it wasn’t willing to actually show me content from the feeds I added until the next morning, during the ten minutes after I added them, it requested robots.txt alone 47 times, with four different user-agents. Don’t worry, Yahoo!, the rest of the world has plenty of bandwidth and server threads to make up for you using three different feed fetchers as well as your regular spider, and we’re happy to have you all over us like an untrained Saint Bernard puppy. No, really, we don’t mind your slobber at all.

Netvibes is quite nice, as Flash-based portals with aggregators go. Shame that it seems to fail on the text testcases not by stripping them, but by putting them in as markup, which is the sort of thing that winds up with me hollering about security holes. Though, the opacity of the interface may foil me.

The Firefox bugs I’ve known about forever, and Rob’s on ‘em with a patch waiting on review, but Thunderbird’s problems with the two CDATA tests worry me: in both cases, it just gives a blank title, rather than just fumbling the escaping. I even built it for the first time in months, to catch up with the trunk, but still got the same result. Bad enough that it’s an odd and worrisome bug that I might have to file, but worse yet that the competition for non-sucky Windows browser managed fine with every test.

Oh, and Luke? Pretty nice showing for a one-person unpaid hobby aggregator, mate ;)

Sam sensibly put it in the Atom wiki, without the need for decent Unicode font support and with the opportunity to add your own results.

66 Comments

Comment by James Holderness #
2005-12-18 19:15:57

I’d just like to point out that Snarfer has no problems with any of those tests either.

Comment by James Holderness #
2005-12-18 19:21:42

Um, I added link markup on that comment (which your help said was legal) but it seems to have been stripped. Not sure what I did wrong, but here’s the URL for anyone interested: http://www.snarfware.com/

Comment by Phil Ringnalda #
2005-12-18 19:33:53

You missed the closing quote after the URL, which WordPress cleaned up in its rather aggressive way: ”oh, so he can’t get his quotes right, can’t he? NO LINKS FOR YOU, TWO WEEKS!”

Comment by James Holderness #
2005-12-18 20:11:09

Doh. Sorry, I’m an idiot. It’s been a long day (4 AM here).

Btw, I truly appreciate you making the effort to highlight this issue (Robert too). Hopefully the increased awareness will result in better development from the aggregator authors and a better experience for end users.

 
 
 
 
Comment by Aristotle Pagaltzis #
2005-12-18 20:03:08

Liferea, as I’ve mentioned before, also passes every one of these test cases.

Comment by Phil Ringnalda #
2005-12-18 20:23:31

I actually had a live CD in my hand, thinking I should test Liferea just so I could have another green-across-the-board, but then I remembered that Linux wouldn’t recognize my crappy Winmodem, and I wasn’t in the mood to wrestle with cables and Windows Connection Sharing (or to move out of the woods so I could have a decent connection).

 
 
Comment by Adriaan Tijsseling #
2005-12-18 20:07:35

I’ve also seen cases where feeds had escaped HTML in the CDATA section where it should not have been escaped.

Comment by Phil Ringnalda #
2005-12-18 20:14:41

And in RSS, the presence of that sort of thing is cause to have a function that applies some desperate heuristic to try to guess whether Joe User wants to see rendered markup, or escaped markup.

In Atom, doing so is equivalent to gathering up everyone who spent years enduring the endless arguments on the mailing list, and urinating on them.

Either aggregators render Atom according to what it claims to be, not what they guess the author might have possibly wanted maybe, or we’ve wasted the last couple of years in the most unpleasant way possible. Anyone who wants that to be the case had better be prepared to share my disappointment and unhappiness. Over, and over, and over, and over.

Comment by Anil #
2005-12-19 15:40:40

”In Atom, doing so is equivalent to gathering up everyone who spent years enduring the endless arguments on the mailing list, and urinating on them.”

Resolved: Closed (By design)

Comment by Phil Ringnalda #
2005-12-19 16:07:33

Odd that usually I’m the first one to thwap with the back side of Hanlon’s Razor, but this time it didn’t even occur to me what it might look like from the other end of the stream.

 
Comment by Robert Sayre #
2005-12-19 17:11:57

Way to put out the flames.

Namaste y’all!

 
 
 
 
Comment by Sameer D'Costa #
2005-12-18 21:15:28

Phil, with every post that you make about aggregators not handling atom feeds properly, I feel more guilty about not submitting a patch for Gregarius/Magpie. But then on the hand I see that you also feel guilty. I hope you succumb before I do.

Comment by Phil Ringnalda #
2005-12-19 00:54:45

Heh. It’s so nice to have new friends who don’t know you yet.

The fact that I took more than a year to change return NS_ERROR_FAILURE to return NS_OK really should give you an idea about how long you have to wait if you’re going to out-wait me ;)

 
 
Comment by Robert Sayre #
2005-12-18 21:40:52

Worth mentioning that none of them have a vertical column of checkmarks. There is no happy path!

Comment by Phil Ringnalda #
2005-12-19 00:49:28

Ayuh. Once you get Firefox fixed, HTML-entity works as long as you escape greater-than to make Yahoo! happy (and ignore Google Reader until it even pretends to listen just a little bit), but nobody’s going to buy fixing Live Bookmark titles as a 1.5.0.x fix, so we’re looking at next July with luck, to have one working path in just this set, unless everybody else moves more quickly.

 
 
Comment by Kafkaesquí #
2005-12-18 23:51:47

So basically a ”rah” to us Sharpreader boosters. At least for this.

 
Comment by Manuzhai #
2005-12-19 00:32:42

It seems FeedDemon is doing allright with this… At least the posts from your feed look just right (although it dropped the thingie on the latest ongoing item, Drop the <!DOCTYPE>, you might want to put that in your testcases).

Comment by Phil Ringnalda #
2005-12-19 00:43:37

That should be the same situation as Text NCR (modulo the (better not be a) difference of decimal vs. hex numeric character reference), unless it fails to realize that no type attribute means type="text". I did have another testcase for that, but nobody was missing it, so I figured it just confused the issue.

Could you do me a favor, and subscribe to that testcase in FeedDemon, if you don’t want to subscribe to the whole OPML file of testcases?

 
Comment by James Holderness #
2005-12-19 01:37:16

Did you actually run the set of tests in FeedDemon? When I tested with version 1.5 it failed everything except the 3 HTML titles.

 
 
Comment by Anne van Kesteren #
2005-12-19 00:59:45

I already knew that bad specifications were bad for interoperability. Good specifications apparently don’t help much. Time for an official testsuite?

 
Comment by Anne van Kesteren #
2005-12-19 01:01:05

That, and hurray for the company I work for! Together with Sharpreader we can beat the world, or so.

 
Comment by Dotan Dimet #
2005-12-19 02:05:37

All your (I presume) checkmarks show up as ? to me (Firefox 1.5, Windows XP, Hebrew/Israel locale). In Bloglines you don’t even see the red/green font (can’t you use a cell background to make it clearer? just the question mark and 2 sides of the inner cell border are colored).
Ah, just saw the earlier post (I read you in Bloglines, which means reverse order). Sam’s table is also filled with question marks (big and bold red and green ones). What exactly made you think these smilies and frownies (or whatever the glyph is) would show up in da fox?

 
Comment by Roger Benningfield #
2005-12-19 02:27:41

Dotan: Same here… I see nuthin’ but question marks. And since I’m color blind, I can barely detect a difference on that front either.

 
Comment by Rijk #
2005-12-19 02:38:49

How nice to describe Opera as ”the competition for non-sucky Windows browser” :)

It’s a bit worrying that you find bugs in Firefox and Thunderbird especially worrying because another browser/mailer gets it right. Wasn’t the whole Mozilla project based on the idea of doing things ”the right way”, not leave it at ”good enough”?

Comment by Phil Ringnalda #
2005-12-19 21:59:13

How’s this for another backhanded compliment: I actually paid for Opera, during some unpatched Firefox security bug period when I wanted to have a spare browser, and didn’t feel at all bad about it when you went free not long afterward. If I could build my own, and submit patches, I’d be tempted to switch. (Though, if you took patches from me it would only be a matter of time before it would be something I wouldn’t want to use any more.)

The Mozilla situation is, of course, more complicated than any one slogan can capture. There are certainly parts of Gecko and Spidermonkey where it’s ”Do it The Right Way (or, just don’t bother doing it at all)” but out on the frontend where things get fuzzier it’s quite often more ”Do it good enough for who it’s for,” and pre-Atom what to do with RSS didn’t really have a Right Way, just whatever way the person writing the code wanted to see. (And, really, there are lots of pragmatic wrong things in the parser and layout, too: the days when we would break every topsite to prove a point are pretty much gone, if they ever really existed.)

Comment by Rijk #
2005-12-20 03:41:58

Of course you can submit patches! Opera is hiring, after all, so CVS access is possible. You can choose between positions in Norway, Sweden and China.

If you apply for a job, mention my name, because I get a bonus if you’re hired :)

 
 
 
Comment by Roger Benningfield #
2005-12-19 02:40:38

Phil: For what it’s worth (very, very little), JournURL’s aggregator passes all tests. Of course, it only does so because of your earlier post, complaining about aggregators that weren’t doing the right thing, but better late than never…

Comment by Phil Ringnalda #
2005-12-19 21:45:43

Well, no, you’re ”better early than late”: I only had to holler at you once, without needing to show you tables and charts and pictures with circles and arrows and a paragraph on the back of each one.

 
 
Comment by Roger Benningfield #
2005-12-19 02:43:26

Almost forgot: results.

 
Comment by Dotan Dimet #
2005-12-19 03:07:57

Opera (like IE) shows small boxes [] with a colored foreground.
OK, I figured it out, thanks to this site (linked to from a Wikipedia discussion). It would be more accurate to write this:
If you’re on Windows, use Firefox and make sure you have unicode fonts installed (they come with MS Office 2000/XP, but might not be part of the default install) to see this table correctly (instead of this bit:
If you are using Internet Explorer, all those results may well look like identical empty squares, instead of checkmarks for pass and Xes for fail. May I recommend a better browser? Or another better browser?).
I suspect it probably won’t work if you don’t have a font that’s over a meg in size.

 
Comment by Manuzhai #
2005-12-19 03:16:08

Alright; FeedDemon (I’m using 1.6.0.12, it’s an RC) does the HTML cases correctly, but fails on the text and XHTML cases. I’ve posted a bug thingie on the feedback forum.

 
Comment by d.w. #
2005-12-19 04:36:07

Phil, I was pleasantly surprised to note that Safari 2.02 displayed every testcase correctly, particularly as Atom 1.0 support was only added after 2.01.

Comment by fluffy #
2006-01-22 13:04:25

Yep, they all work fine for me too.

 
 
Comment by Mark #
2005-12-19 06:07:42

Universal Feed Parser pre-4.0-68-cvs passes all of these tests.

 
Comment by Daniel Cater #
2005-12-19 07:20:09

Yeah, Thunderbird suffers on the CDATA. Have bugs been filed?

Comment by Robert Sayre #
2005-12-19 08:44:00

Bug filed. Assigned to me.

 
 
2005-12-19 10:23:01

[...] Who knows a title from a hole in the ground? [...]

 
Comment by Nick Bradbury #
2005-12-19 14:35:46

Thanks for the test cases, Phil. I’ve fixed FeedDemon so that it handles all cases correctly.

Comment by Phil Ringnalda #
2005-12-19 22:01:41

Thank you.

 
 
Comment by Patrick Grote #
2005-12-19 17:37:01

I am using Firefox 1.5 on XP and only see question marks. Do I need to do anything special?

Thanks!

Comment by Phil Ringnalda #
2005-12-19 18:39:44

Yeah, apparently so. One of the fonts mentioned here. Or Windows’ East Asian language support. Or OpenOffice.

Next time? ASCII art, baby, nothing but ASCII art. ”X” and ”0” were good enough for Pops, they’re good enough for me.

Comment by Aristotle Pagaltzis #
2005-12-19 21:10:37

Funnily enough, using images would have solved both your styling quandary as well as the font support problems. For the poor gerbils you coud have used the Unicode characters as the alt text; such subtle self-parody.

 
Comment by Patrick Grote #
2005-12-20 07:19:47

That did it! Thanks, Phil.

Just look at those streamlined check marks.

 
 
 
Comment by Dare Obasanjo #
2005-12-19 17:51:31

I just checked in a fix for all the tests into the RSS Bandit CVS tree.

Comment by Phil Ringnalda #
2005-12-19 22:03:26

Nice that I could taken your mind off other things, though apparently not for very long!

 
 
Comment by James Holderness #
2005-12-19 21:19:31

Was there any reason why you didn’t include a test of XHTML CDATA? I can guarantee you there will be aggregators that have passed the other CDATA tests that will fail on XHTML CDATA. Or are Atom processors not required to handle that case?

Comment by Phil Ringnalda #
2005-12-19 22:13:04

I’m not quite sure. It’s certainly a Grade-A difficult test — how many people actually know the difference in expected treatment between application/xhtml+xml CDATA and text/html CDATA (not that I’ve ever bought the reasoning behind not treating CDATA sections in HTML as though they were CDATA sections in SGML, but, meh).

So, I thought I’d save that for a separate set of ”when we said ’treat this inline XML as XHTML’? betcha didn’t think about this meaning of that, did you?” tests.

Comment by James Holderness #
2005-12-20 00:00:26

I wouldn’t have thought there was any difference in this context. Both CDATA sections should be processed at the XML level. An HTML renderer shouldn’t ever have to see them.

Now if you were to escape a CDATA section inside type=”html”, I think that might be asking a bit much of aggregators. Even the HTML spec recommends against doing that.

Comment by Phil Ringnalda #
2005-12-20 00:16:20

Hmm. I think I had a bug in my brain, confusing the metaphoric ”type="xhtml" means you just copy the content out of the source and into the source of what you’ll serve to the browser” with the reality of parsing it and reconstituting it. Okay, one more when I get a chance.

 
 
 
 
Comment by Jacques Distler #
2005-12-19 21:51:42

Testcase, shmestcase.

I’ve missed 3 weeks of posts at weblog.philringnalda.com because, since late November, NetNewswire 2.0.1 has been unable to parse your (perfectly valid AFAICT) Atom 1.0 feed.

I am mightily pissed about the state of Atom support, and it has little to do with the odd spooged <title> element.

Comment by Phil Ringnalda #
2005-12-19 22:07:27

Um. You know, I did wonder why nobody was telling me about NNW test results. And about where you were, though I saw that you were traveling (almost into my backyard) and fighting the Apache and whatnot. I knew I was appearing silent to users of certain other aggregators, ones that have… ”personal issues” with Atom, but I never thought for a second that NNW wouldn’t like me.

 
 
Comment by Jacques Distler #
2005-12-19 22:21:04

FWIW, NewNewsWire says, apropos of your feeds (in the ”Dump Subscription Properties” tool):

session ID: 110
Error string: Can’t display this subscription because the feed could not be found.

Dunno what brand of coca-derivative it’s smoking.

I just figured you were on blog-vacation again.

Anyway, I’ve been looking at MacOSX Aggregator alternatives (NewsFire displays your feeds OK) but, so far, my best hope is for NNW 2.0.2.

Comment by Phil Ringnalda #
2005-12-19 22:35:18

That doesn’t sound very right. Which of my various and sundry ”I hope I redirected them all” URLs is it using?

I’ll admit that blog-vacation is always a good guess with me, but without looking I’d bet that I’ve posted more frequently since switching to WordPress than I have, well, ever.

Comment by Jacques Distler #
2005-12-19 22:44:29

http://weblog.philringnalda.com/feed/
http://weblog.philringnalda.com/comments/feed/

It’s not as if the feed(s) are not at those URLs. They are. But NNW 2.01, in its palsied state, is unable to recognize the as Atom 1.0 feeds.

As recently as Nov 28, it recognized them just fine.

 
 
Comment by Brent Simmons #
2005-12-20 10:09:40

There is a bug in NetNewsWire 2.0.1 with Atom feeds that have some text after the closing feed tag. (For instance — and in this particular instance — WP-Cache adds some comments at the end of the document.) This bug is fixed here in the lab and the fix will be in the next release of NetNewsWire.

(I’m also using these test cases as I’m working on the next version. So — thanks!)

 
 
Comment by travis #
2005-12-20 07:37:38

Results for JetBrains Omea Reader 2.0 (build 671.6):

html cdata: ✘
html entity: ✘
html NCR: ✘
text entity: ✔
text in CDATA: ✔
text in NCR: ✔
xhtml entity: ✘
xhtml NCR: ✘

 
Comment by Jacques Distler #
2005-12-20 10:09:23

NetNewsWire has a 3-paned interface.

In the ”headlines” pane, 3 of your test cases display incorrectly: text/entity, xhtml/entity and xhtml/ncr. The rest display correctly.

In the ”entry” pane, all the test cases display the title correctly.

P.S.: Thanks for ”fixing” your Entry Feed. I will await, eagerly, the return of your comment feed.

Comment by Phil Ringnalda #
2005-12-20 22:17:45

’kay, should be ”fixed” — I just commented out the adding of the comments, since I don’t really feel any need to constantly debug something that pretty much seems to just work.

 
 
Comment by Alastair #
2005-12-20 16:39:25

Another good, complete, unicode font for Windows is the updated Ariel supplied with MS Word 2002 (and later). You need to install it separately, as described by this KM article.

Installing this (or any other comprehensive unicode font) fixes Firefox, but not IE. And here is an attempt to explain why.

 
2005-12-25 13:18:46

Mały test czytników RSS

Phil Ringnalda opublikował mały test czytników RSS. Sharpreader, którego od dłuższego czasu używam, nie zawiódł mnie i tym razem, przechodząc wszystkie testy celująco. Cóż, zawsze mnie cieszy, gdy mój wybór oprogramowania okazuje się s…

 
2005-12-25 23:31:11

[...] I will continue to be in awe of Phil and posts like these., [...]

 
Comment by Arve #
2006-01-10 08:49:40

Bloglines (and probably even a few other aggregators) fail miserably with < and > elsewhere.

See this example to experience what I have come to call the ”Bloglines experience”: My posting on <canvas> renders the canvas element literally, instead of displaying the correct TEXT.

That’s the reward for using JavaScript where it’s not supposed to be used. Perhaps I should prepend every entry with <marquee> when Bloglines accesses the URL?

[ Oh, and Phil: Could you get WP to recognize the fact that I'm not a spammer if I have turned off sending of the referer header? I'm seeing Error: This file cannot be used on its own. every time I try to submit a comment. ]

Comment by Phil Ringnalda #
2006-01-10 09:23:22

Sure, I’m willing to let you not send referrers. Just give me something that will work as well: so far this morning, I’ve gotten 3 comments, and 90 foiled attempts at directly posting spam to wp-comments-post.php (not counting your false positives). What do you have in mind as an alternative? I thought about allowing Opera through, but one of the spammers is using a UA rotator that includes Opera. Maybe only allow Opera 9.xx on Linux? That shouldn’t be in spammers’ rotation lists for a while.

Comment by Jacques Distler #
2006-01-10 11:31:09

Just give me something that will work as well: so far this morning, I’ve gotten 3 comments, and 90 foiled attempts at directly posting spam to wp-comments-post.php

Dang!

Where’s that ”forced-comment-preview” when you need it? Heck, where’s that ”comment-preview”?

 
Comment by Jacques Distler #
2006-01-10 11:36:48

More seriously, back, years ago, when I last worried about comment-spambots, they were smart enough to fake the referer header.

Now, some can’t even do HTTP correctly. But, surely one can’t rely on that.

(Sorry for forgetting to sign the last comment.)

 
 
 
Comment by travis #
2006-01-13 06:59:52

Results for JetBrains Omea Reader 2.1 (build 914.3):

html cdata: ✘
html entity: ✘

html NCR: ✘
text entity: ✔
text in CDATA: ✔
text in NCR: ✔
xhtml entity: ✘
xhtml NCR: ✘

Looks like not much changed in their rendering engine

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.