How not to do an atom:summary

One of the things I like best about Atom so far is that it clearly separates a post summary, the thing that I choose to put in my RSS description element, from the full post content, which I choose to put in content:encoded. There are reasons to have both, there are people who want each, both within RSS-as-a-reading-format and in the wider view of syndication, and I want to be able to make it clear what I’m delivering to those in each camp.

However, Blogger’s peeing in my atom:summary already.

If you tell Blogger that you want an Atom feed with “Short Descriptions” it will publish you a feed with

<summary type="application/xhtml+xml" xml:base="http://foobazbar.blogspot.com" xml:lang="en-US" xml:space="preserve">

that is, XHTML where whitespace is significant. In that summary, it will put the first paragraph, or approx. 255 characters, whichever is shorter of your post, with the (X)HTML stripped.

With the (X)HTML stripped.

If you start a post with

<ol>
 <li>Item one</li>
 <li>Item two</li>
 <li>Item three</li>
</ol>

then the atom:summary in your Atom feed will contain

<div xmlns="http://www.w3.org/1999/xhtml">
 Item one
 Item two
 Item three
</div>

The idea of tagless “XHTML” with significant whitespace outside an element where whitespace is significant to XHTML (pre, script, style, and textarea, if my memory is right) is so utterly wrong that I see that even the (utterly wrong, for something which is of type application/atom+xml) CSSed display of it has no idea how to deal with such a beast.

If you want your summary to be application/xhtml+xml, then make it XHTML, whether you have to take the whole first block-level element, or chop it off in the middle of tags and then run that clobbered fragment through Tidy to make it something approximately valid. If you don’t want to put XHTML in it, make it text/plain, where a consumer will know that it needs to put in its own tags to hope to retain some semblance of the original sense (though you still will turn things which start with a blockquote into hopeless gibberish). But don’t tell me that plain text is application/xhtml+xml, while delivering an application as plain text. That’s not the Atom I was promised.

15 Comments

Comment by Asbjørn Ulsberg #
2004-03-09 04:38:37

Blogger is obviously doing the wrong thing here. Either, they should set the ’type’ to ’text/plain’, or they should not strip the XHTML. There’s no guidelines in Atom for these kinds of things yet, but there’s no doubt that Blogger isn’t doing the right thing here.

 
Comment by Adam #
2004-03-09 16:07:49

My god, someone’s misimplemented part of Atom while it’s in beta! The format is doomed! Good thing RSS prevents this problem by not having any such thing as an error in a feed.

Comment by Phil Ringnalda #
2004-03-09 16:20:09

Hard to imagine why you wouldn’t have wanted to leave a last name or an identifying URL with that senseless and moronic comment. Surely by now everyone who knows you knows you’re an idiot?

It’s not ”in beta”, it’s a draft/pre-draft depending on how you look at it, and Blogger’s producing or capable of producing a few million feeds in however they interpret that draft, and anyone writing client support for that draft will look at Blogger feeds to decide what to expect from Atom feeds, and anything they do will wind up as an entrenched part of ”how we do it because we have to no matter what the spec says”, and we’ll wind up right where we are, with various chunks of the format that we can’t use, or can’t use right, because we have so much history of having used them wrong.

Comment by Asbjørn Ulsberg #
2004-03-10 02:06:52

Yup. Ignoring bad implementations, even if they’re based on a technical draft, will just repeat the sad history of HTML and RSS. And if we repeat those histories, the point of developing Atom has diminished quite significantly. So let’s don’t.

 
 
 
Comment by Pat #
2004-03-10 08:29:52

So…did anybody report this as a bug? Just curious…

Comment by Phil Ringnalda #
2004-03-10 09:28:05