Or, maybe more strict
I’ve been idly thinking about starting a campaign to get RSS/Atom aggregator authors (and validator authors, as well) to be a little less strict and dogmatic about what feed content is uniformly evil, and must be stripped out in all cases. We have a roughly shared (and mostly unexamined) set of standards, mostly based on Mark Pilgrim’s groundbreaking post, saying that you should never allow, among other things, any Javascript, any CSS styles, or any object
or embed
elements.
That seemed a bit too draconian to me. For instance, Javascript is evil if you build your interface out of Javascript (script injected into Bloglines would be truly awful), or if you display multiple entries in a single HTML instance. But for a standard three-pane aggregator like Sharpreader, it’s hard to imagine what Javascript could do that would be any different than any other web page (except possibly spawn popups that would be blocked by a add-on toolbar in IE, but not in an embedded web browser control). Sure, an entry could destroy itself with script, but so what? Just move on to the next entry.
In the same vein, although the Great Platypus Attack showed that allowing CSS into an aggregator that shows multiple entries in a single view is something you really don’t want to do, in a three-pane, one-entry-at-a-time aggregator all it should be able to do is destroy itself, without harming anything else.
Finally, <object>s got the boot for fear that they might be shown in a situation where the browser was using a more lax security zone, like the Intranet zone or the (semi-mythical) Local machine zone, where unsigned unsafe objects are merrily loaded without any prompting. Quite frankly, I haven’t been able to figure out the truth of that: I think I was reading that when you embed the IE browser control, you can actually either choose your security zone, or create your own custom zone that’s far more restrictive than anything the user would otherwise get. Or maybe not. I’d love to hear from an actual Windows programmer, who has actually done it (especially one who has then written up just what it takes).
But, the inevitable but: the more you look, the more evil there is in the world.
Say you build an online three pane aggregator: you’re only displaying one entry at a time, so you decide to allow CSS (hoping nobody will target you to attack the rest of your interface), but since you use Javascript and store the user login in a cookie, you have to remove all Javascript. You strip every <script>
element, and every attribute that starts with on
on every single element, and think you’re free of Javascript. Then I come along, and drop <p style="background-image:url(javascript:alert('owned'))">
in a feed. Or, rather than an alert, just use a simple and invisible Cross Site Scripting exploit to steal your users’ login cookies, and alter their subscriptions while I’m at it.
Or, say you have a Windows program embedding the IE browser control. You’ve carefully managed your security zone, so objects are no more dangerous than they are in general, you only display one entry at a time so CSS is no danger, and you’ve built your own popup blocker so you don’t have any reason to strip Javascript. Then I come along again (maybe you should just refuse to subscribe to any of my feeds?), and drop in a simple little paragraph: <p style="height: expression(alert('gotcha'))">
. To maximize the pain, I’d probably put a link in the paragraph, with text along the lines of “The explanation of just how I gotcha is here.” In IE, that simple little style declaration will pop up an alert saying “gotcha” (twice, for some reason), and that’s it, assuming you immediately close the browser or the program that’s embedding the browser. Otherwise, make the mistake of clicking anything in the browser window (thus the social engineering of the link), and the browser apparently thinks you might have done something to require a reflow of the page, and a recalculation of the expression for the element height. At that point, you have an eternal alert: you can’t do anything else with the browser, because alerts are modal, they have to be dismissed to do anything, and dismissing it causes another to pop up in its place. You can ctrl-alt-delete the program away, turn off the computer, or throw it away, or spend the rest of your life clicking OK.
Now I’m starting to think that Mark didn’t go far enough: you would think that after talking about a brutal and simple IE exploit like that, I’d be doing my usual Firefox victory dance, but I’m not: a bunch of very smart people thought a whole lot about how to safely consume HTML and CSS and Javascript, and still let that slip by. If you’re just throwing something together after work and on every third weekend, there’s just no way you’re going to know about everything you need to strip out, which means that you need to approach it from the other (more generally secure) direction, and only allow through the relatively few elements and attributes that you are certain can’t be subverted in any way. Even if you are in the most solid position, single entries in a native app, not a browser, with your embedded control locked down, there are still going to be exploits that weren’t discovered yet, or that you didn’t ever imagine. Sigh. We should have gone with RSS 3 or ESF or something else that’s designed from the ground up to be utterly safe and featureless.
My feed parser has a whitelist of elements and attributes that it accepts; everything else gets stripped. This is on by default. About once a month, someone complains that it’s on by default, and I give them horror stories and they go away whimpering.