Who knows a <title> from a hole in the ground?
Results from a few testcases of various forms of escaping an Atom title which is not markup. Not exactly overwhelmingly impressive. In every case (unless I’ve screwed up) the title should be displayed as <title>.
| Aggregator | HTML CDATA | HTML entity | HTML NCR | Text CDATA | Text entity | Text NCR | XHTML entity | XHTML NCR |
|---|---|---|---|---|---|---|---|---|
| Bloglines | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
| My Yahoo! | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✔ | ✔ |
| Newsgator Online | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ |
| Google Reader | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ | ✘ |
| Netvibes | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ |
| Gregarius | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ |
| Thunderbird | ✘ | ✔ | ✔ | ✘ | ✔ | ✔ | ✔ | ✔ |
| Firefox | ✘ | ✘ | ✘ | ✔ | ✔ | ✔ | ✘ | ✘ |
| Opera | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
| RSS Bandit | ✔ | ✔ | ✔ | ✘ | ✘ | ✘ | ✘ | ✘ |
| Sharpreader | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Some notes:
If you are using Internet Explorer Windows, all those results may well look like identical empty squares, instead of checkmarks for pass and Xes for fail. May I recommend some fonts and a better browser that will make use of them? Or another better browser? That’s gotta be better than digging in the source for the classnames.
Bloglines has one “?” because I failed to subscribe to one feed when I started this last night, and they apparently take eight or ten or twelve hours before they first fetch a feed. Maybe the new datacenter will improve things. Passed, when it finally showed up.
Rojo doesn’t appear because they either take more than 24 hours to first fetch a feed, or they don’t support Atom 1.0, and don’t support it by claiming that it hasn’t yet been fetched. Can’t tell, really.
Kinja doesn’t appear because it says that it’s down for “routine maintenance” and has been all day. Has it been for longer than all day? Is it gone forever? Don’t ask me.
“Windows” Live fails to support Atom 1.0 in a most unamusing way: it pretends that nothing happened. Import the OPML file of test cases, and you get a blank content-hole for a second, then you’re back where you started. Tell it to subscribe to a particular URL, and it starts talking about the results of your search for words in the URL.
My Yahoo! found the most amusing way to fail: it would have been fine on the three HTML tests, except that it fails to realize that > only matters when it follows ]] — I don’t bother escaping it otherwise, and so they stripped it, while allowing through the <title. It also gets points for being eager: despite the fact that it wasn’t willing to actually show me content from the feeds I added until the next morning, during the ten minutes after I added them, it requested robots.txt alone 47 times, with four different user-agents. Don’t worry, Yahoo!, the rest of the world has plenty of bandwidth and server threads to make up for you using three different feed fetchers as well as your regular spider, and we’re happy to have you all over us like an untrained Saint Bernard puppy. No, really, we don’t mind your slobber at all.
Netvibes is quite nice, as Flash-based portals with aggregators go. Shame that it seems to fail on the text testcases not by stripping them, but by putting them in as markup, which is the sort of thing that winds up with me hollering about security holes. Though, the opacity of the interface may foil me.
The Firefox bugs I’ve known about forever, and Rob’s on ‘em with a patch waiting on review, but Thunderbird’s problems with the two CDATA tests worry me: in both cases, it just gives a blank title, rather than just fumbling the escaping. I even built it for the first time in months, to catch up with the trunk, but still got the same result. Bad enough that it’s an odd and worrisome bug that I might have to file, but worse yet that the competition for non-sucky Windows browser managed fine with every test.
Oh, and Luke? Pretty nice showing for a one-person unpaid hobby aggregator, mate ;)
Sam sensibly put it in the Atom wiki, without the need for decent Unicode font support and with the opportunity to add your own results.
I’d just like to point out that Snarfer has no problems with any of those tests either.
Um, I added link markup on that comment (which your help said was legal) but it seems to have been stripped. Not sure what I did wrong, but here’s the URL for anyone interested: http://www.snarfware.com/
You missed the closing quote after the URL, which WordPress cleaned up in its rather aggressive way: ”oh, so he can’t get his quotes right, can’t he? NO LINKS FOR YOU, TWO WEEKS!”
Doh. Sorry, I’m an idiot. It’s been a long day (4 AM here).
Btw, I truly appreciate you making the effort to highlight this issue (Robert too). Hopefully the increased awareness will result in better development from the aggregator authors and a better experience for end users.
Liferea, as I’ve mentioned before, also passes every one of these test cases.
I actually had a live CD in my hand, thinking I should test Liferea just so I could have another green-across-the-board, but then I remembered that Linux wouldn’t recognize my crappy Winmodem, and I wasn’t in the mood to wrestle with cables and Windows Connection Sharing (or to move out of the woods so I could have a decent connection).
I’ve also seen cases where feeds had escaped HTML in the CDATA section where it should not have been escaped.
And in RSS, the presence of that sort of thing is cause to have a function that applies some desperate heuristic to try to guess whether Joe User wants to see rendered markup, or escaped markup.
In Atom, doing so is equivalent to gathering up everyone who spent years enduring the endless arguments on the mailing list, and urinating on them.
Either aggregators render Atom according to what it claims to be, not what they guess the author might have possibly wanted maybe, or we’ve wasted the last couple of years in the most unpleasant way possible. Anyone who wants that to be the case had better be prepared to share my disappointment and unhappiness. Over, and over, and over, and over.
”In Atom, doing so is equivalent to gathering up everyone who spent years enduring the endless arguments on the mailing list, and urinating on them.”
Resolved: Closed (By design)
Odd that usually I’m the first one to thwap with the back side of Hanlon’s Razor, but this time it didn’t even occur to me what it might look like from the other end of the stream.
Way to put out the flames.
Namaste y’all!
Phil, with every post that you make about aggregators not handling atom feeds properly, I feel more guilty about not submitting a patch for Gregarius/Magpie. But then on the hand I see that you also feel guilty. I hope you succumb before I do.
Heh. It’s so nice to have new friends who don’t know you yet.
The fact that I took more than a year to change
return NS_ERROR_FAILUREtoreturn NS_OKreally should give you an idea about how long you have to wait if you’re going to out-wait me ;)Worth mentioning that none of them have a vertical column of checkmarks. There is no happy path!
Ayuh. Once you get Firefox fixed, HTML-entity works as long as you escape greater-than to make Yahoo! happy (and ignore Google Reader until it even pretends to listen just a little bit), but nobody’s going to buy fixing Live Bookmark titles as a 1.5.0.x fix, so we’re looking at next July with luck, to have one working path in just this set, unless everybody else moves more quickly.
So basically a ”rah” to us Sharpreader boosters. At least for this.
It seems FeedDemon is doing allright with this… At least the posts from your feed look just right (although it dropped the thingie on the latest ongoing item, Drop the <!DOCTYPE>, you might want to put that in your testcases).
That should be the same situation as Text NCR (modulo the (better not be a) difference of decimal vs. hex numeric character reference), unless it fails to realize that no
typeattribute meanstype="text". I did have another testcase for that, but nobody was missing it, so I figured it just confused the issue.Could you do me a favor, and subscribe to that testcase in FeedDemon, if you don’t want to subscribe to the whole OPML file of testcases?
Did you actually run the set of tests in FeedDemon? When I tested with version 1.5 it failed everything except the 3 HTML titles.
I already knew that bad specifications were bad for interoperability. Good specifications apparently don’t help much. Time for an official testsuite?
That, and hurray for the company I work for! Together with Sharpreader we can beat the world, or so.
All your (I presume) checkmarks show up as ? to me (Firefox 1.5, Windows XP, Hebrew/Israel locale). In Bloglines you don’t even see the red/green font (can’t you use a cell background to make it clearer? just the question mark and 2 sides of the inner cell border are colored).
Ah, just saw the earlier post (I read you in Bloglines, which means reverse order). Sam’s table is also filled with question marks (big and bold red and green ones). What exactly made you think these smilies and frownies (or whatever the glyph is) would show up in da fox?
Dotan: Same here… I see nuthin’ but question marks. And since I’m color blind, I can barely detect a difference on that front either.
How nice to describe Opera as ”the competition for non-sucky Windows browser” :)
It’s a bit worrying that you find bugs in Firefox and Thunderbird especially worrying because another browser/mailer gets it right. Wasn’t the whole Mozilla project based on the idea of doing things ”the right way”, not leave it at ”good enough”?
How’s this for another backhanded compliment: I actually paid for Opera, during some unpatched Firefox security bug period when I wanted to have a spare browser, and didn’t feel at all bad about it when you went free not long afterward. If I could build my own, and submit patches, I’d be tempted to switch. (Though, if you took patches from me it would only be a matter of time before it would be something I wouldn’t want to use any more.)
The Mozilla situation is, of course, more complicated than any one slogan can capture. There are certainly parts of Gecko and Spidermonkey where it’s ”Do it The Right Way (or, just don’t bother doing it at all)” but out on the frontend where things get fuzzier it’s quite often more ”Do it good enough for who it’s for,” and pre-Atom what to do with RSS didn’t really have a Right Way, just whatever way the person writing the code wanted to see. (And, really, there are lots of pragmatic wrong things in the parser and layout, too: the days when we would break every topsite to prove a point are pretty much gone, if they ever really existed.)
Of course you can submit patches! Opera is hiring, after all, so CVS access is possible. You can choose between positions in Norway, Sweden and China.
If you apply for a job, mention my name, because I get a bonus if you’re hired :)
Phil: For what it’s worth (very, very little), JournURL’s aggregator passes all tests. Of course, it only does so because of your earlier post, complaining about aggregators that weren’t doing the right thing, but better late than never…
Well, no, you’re ”better early than late”: I only had to holler at you once, without needing to show you tables and charts and pictures with circles and arrows and a paragraph on the back of each one.
Almost forgot: results.
Opera (like IE) shows small boxes [] with a colored foreground.
OK, I figured it out, thanks to this site (linked to from a Wikipedia discussion). It would be more accurate to write this:
If you’re on Windows, use Firefox and make sure you have unicode fonts installed (they come with MS Office 2000/XP, but might not be part of the default install) to see this table correctly (instead of this bit:
).
I suspect it probably won’t work if you don’t have a font that’s over a meg in size.
Alright; FeedDemon (I’m using 1.6.0.12, it’s an RC) does the HTML cases correctly, but fails on the text and XHTML cases. I’ve posted a bug thingie on the feedback forum.
Phil, I was pleasantly surprised to note that Safari 2.02 displayed every testcase correctly, particularly as Atom 1.0 support was only added after 2.01.
Yep, they all work fine for me too.
Universal Feed Parser pre-4.0-68-cvs passes all of these tests.
Yeah, Thunderbird suffers on the CDATA. Have bugs been filed?
Bug filed. Assigned to me.
[...] Who knows a title from a hole in the ground? [...]
Thanks for the test cases, Phil. I’ve fixed FeedDemon so that it handles all cases correctly.
Thank you.
I am using Firefox 1.5 on XP and only see question marks. Do I need to do anything special?
Thanks!
Yeah, apparently so. One of the fonts mentioned here. Or Windows’ East Asian language support. Or OpenOffice.
Next time? ASCII art, baby, nothing but ASCII art. ”X” and ”0” were good enough for Pops, they’re good enough for me.
Funnily enough, using images would have solved both your styling quandary as well as the font support problems. For the poor gerbils you coud have used the Unicode characters as the
alttext; such subtle self-parody.That did it! Thanks, Phil.
Just look at those streamlined check marks.
I just checked in a fix for all the tests into the RSS Bandit CVS tree.
Nice that I could taken your mind off other things, though apparently not for very long!
Was there any reason why you didn’t include a test of XHTML CDATA? I can guarantee you there will be aggregators that have passed the other CDATA tests that will fail on XHTML CDATA. Or are Atom processors not required to handle that case?
I’m not quite sure. It’s certainly a Grade-A difficult test — how many people actually know the difference in expected treatment between
application/xhtml+xmlCDATA andtext/htmlCDATA (not that I’ve ever bought the reasoning behind not treating CDATA sections in HTML as though they were CDATA sections in SGML, but, meh).So, I thought I’d save that for a separate set of ”when we said ’treat this inline XML as XHTML’? betcha didn’t think about this meaning of that, did you?” tests.
I wouldn’t have thought there was any difference in this context. Both CDATA sections should be processed at the XML level. An HTML renderer shouldn’t ever have to see them.
Now if you were to escape a CDATA section inside type=”html”, I think that might be asking a bit much of aggregators. Even the HTML spec recommends against doing that.
Hmm. I think I had a bug in my brain, confusing the metaphoric ”
type="xhtml"means you just copy the content out of the source and into the source of what you’ll serve to the browser” with the reality of parsing it and reconstituting it. Okay, one more when I get a chance.Testcase, shmestcase.
I’ve missed 3 weeks of posts at weblog.philringnalda.com because, since late November, NetNewswire 2.0.1 has been unable to parse your (perfectly valid AFAICT) Atom 1.0 feed.
I am mightily pissed about the state of Atom support, and it has little to do with the odd spooged
<title>element.Um. You know, I did wonder why nobody was telling me about NNW test results. And about where you were, though I saw that you were traveling (almost into my backyard) and fighting the Apache and whatnot. I knew I was appearing silent to users of certain other aggregators, ones that have… ”personal issues” with Atom, but I never thought for a second that NNW wouldn’t like me.
FWIW, NewNewsWire says, apropos of your feeds (in the ”Dump Subscription Properties” tool):
Dunno what brand of coca-derivative it’s smoking.
I just figured you were on blog-vacation again.
Anyway, I’ve been looking at MacOSX Aggregator alternatives (NewsFire displays your feeds OK) but, so far, my best hope is for NNW 2.0.2.
That doesn’t sound very right. Which of my various and sundry ”I hope I redirected them all” URLs is it using?
I’ll admit that blog-vacation is always a good guess with me, but without looking I’d bet that I’ve posted more frequently since switching to WordPress than I have, well, ever.
http://weblog.philringnalda.com/feed/
http://weblog.philringnalda.com/comments/feed/
It’s not as if the feed(s) are not at those URLs. They are. But NNW 2.01, in its palsied state, is unable to recognize the as Atom 1.0 feeds.
As recently as Nov 28, it recognized them just fine.
There is a bug in NetNewsWire 2.0.1 with Atom feeds that have some text after the closing feed tag. (For instance — and in this particular instance — WP-Cache adds some comments at the end of the document.) This bug is fixed here in the lab and the fix will be in the next release of NetNewsWire.
(I’m also using these test cases as I’m working on the next version. So — thanks!)
Results for JetBrains Omea Reader 2.0 (build 671.6):
html cdata: ✘
html entity: ✘
html NCR: ✘
text entity: ✔
text in CDATA: ✔
text in NCR: ✔
xhtml entity: ✘
xhtml NCR: ✘
NetNewsWire has a 3-paned interface.
In the ”headlines” pane, 3 of your test cases display incorrectly: text/entity, xhtml/entity and xhtml/ncr. The rest display correctly.
In the ”entry” pane, all the test cases display the title correctly.
P.S.: Thanks for ”fixing” your Entry Feed. I will await, eagerly, the return of your comment feed.
’kay, should be ”fixed” — I just commented out the adding of the comments, since I don’t really feel any need to constantly debug something that pretty much seems to just work.
Another good, complete, unicode font for Windows is the updated Ariel supplied with MS Word 2002 (and later). You need to install it separately, as described by this KM article.
Installing this (or any other comprehensive unicode font) fixes Firefox, but not IE. And here is an attempt to explain why.
Mały test czytników RSS
Phil Ringnalda opublikował mały test czytników RSS. Sharpreader, którego od dłuższego czasu używam, nie zawiódł mnie i tym razem, przechodząc wszystkie testy celująco. Cóż, zawsze mnie cieszy, gdy mój wybór oprogramowania okazuje się s…
[...] I will continue to be in awe of Phil and posts like these., [...]
Bloglines (and probably even a few other aggregators) fail miserably with < and > elsewhere.
See this example to experience what I have come to call the ”Bloglines experience”: My posting on
<canvas>renders the canvas element literally, instead of displaying the correct TEXT.That’s the reward for using JavaScript where it’s not supposed to be used. Perhaps I should prepend every entry with <marquee> when Bloglines accesses the URL?
[ Oh, and Phil: Could you get WP to recognize the fact that I'm not a spammer if I have turned off sending of the referer header? I'm seeing every time I try to submit a comment. ]
Sure, I’m willing to let you not send referrers. Just give me something that will work as well: so far this morning, I’ve gotten 3 comments, and 90 foiled attempts at directly posting spam to
wp-comments-post.php(not counting your false positives). What do you have in mind as an alternative? I thought about allowing Opera through, but one of the spammers is using a UA rotator that includes Opera. Maybe only allow Opera 9.xx on Linux? That shouldn’t be in spammers’ rotation lists for a while.Dang!
Where’s that ”forced-comment-preview” when you need it? Heck, where’s that ”comment-preview”?
More seriously, back, years ago, when I last worried about comment-spambots, they were smart enough to fake the referer header.
Now, some can’t even do HTTP correctly. But, surely one can’t rely on that.
(Sorry for forgetting to sign the last comment.)
Results for JetBrains Omea Reader 2.1 (build 914.3):
html cdata: ✘
html entity: ✘
html NCR: ✘
text entity: ✔
text in CDATA: ✔
text in NCR: ✔
xhtml entity: ✘
xhtml NCR: ✘
Looks like not much changed in their rendering engine