How not to do an atom:summary

One of the things I like best about Atom so far is that it clearly separates a post summary, the thing that I choose to put in my RSS description element, from the full post content, which I choose to put in content:encoded. There are reasons to have both, there are people who want each, both within RSS-as-a-reading-format and in the wider view of syndication, and I want to be able to make it clear what I’m delivering to those in each camp.

However, Blogger’s peeing in my atom:summary already.

If you tell Blogger that you want an Atom feed with “Short Descriptions” it will publish you a feed with

<summary type="application/xhtml+xml" xml:base="" xml:lang="en-US" xml:space="preserve">

that is, XHTML where whitespace is significant. In that summary, it will put the first paragraph, or approx. 255 characters, whichever is shorter of your post, with the (X)HTML stripped.

With the (X)HTML stripped.

If you start a post with

 <li>Item one</li>
 <li>Item two</li>
 <li>Item three</li>

then the atom:summary in your Atom feed will contain

<div xmlns="">
 Item one
 Item two
 Item three

The idea of tagless “XHTML” with significant whitespace outside an element where whitespace is significant to XHTML (pre, script, style, and textarea, if my memory is right) is so utterly wrong that I see that even the (utterly wrong, for something which is of type application/atom+xml) CSSed display of it has no idea how to deal with such a beast.

If you want your summary to be application/xhtml+xml, then make it XHTML, whether you have to take the whole first block-level element, or chop it off in the middle of tags and then run that clobbered fragment through Tidy to make it something approximately valid. If you don’t want to put XHTML in it, make it text/plain, where a consumer will know that it needs to put in its own tags to hope to retain some semblance of the original sense (though you still will turn things which start with a blockquote into hopeless gibberish). But don’t tell me that plain text is application/xhtml+xml, while delivering an application as plain text. That’s not the Atom I was promised.


Comment by Asbjørn Ulsberg #
2004-03-09 04:38:37

Blogger is obviously doing the wrong thing here. Either, they should set the ’type’ to ’text/plain’, or they should not strip the XHTML. There’s no guidelines in Atom for these kinds of things yet, but there’s no doubt that Blogger isn’t doing the right thing here.

Comment by Adam #
2004-03-09 16:07:49

My god, someone’s misimplemented part of Atom while it’s in beta! The format is doomed! Good thing RSS prevents this problem by not having any such thing as an error in a feed.

Comment by Phil Ringnalda #
2004-03-09 16:20:09

Hard to imagine why you wouldn’t have wanted to leave a last name or an identifying URL with that senseless and moronic comment. Surely by now everyone who knows you knows you’re an idiot?

It’s not ”in beta”, it’s a draft/pre-draft depending on how you look at it, and Blogger’s producing or capable of producing a few million feeds in however they interpret that draft, and anyone writing client support for that draft will look at Blogger feeds to decide what to expect from Atom feeds, and anything they do will wind up as an entrenched part of ”how we do it because we have to no matter what the spec says”, and we’ll wind up right where we are, with various chunks of the format that we can’t use, or can’t use right, because we have so much history of having used them wrong.

Comment by Asbjørn Ulsberg #
2004-03-10 02:06:52

Yup. Ignoring bad implementations, even if they’re based on a technical draft, will just repeat the sad history of HTML and RSS. And if we repeat those histories, the point of developing Atom has diminished quite significantly. So let’s don’t.

Comment by Pat #
2004-03-10 08:29:52

So…did anybody report this as a bug? Just curious…

Comment by Phil Ringnalda #
2004-03-10 09:28:05

Well, my theory was that I did, right up there ^. Since I’m not using Blogger as a producer, just as a consumer, it didn’t feel quite right to pick a random test blog, set up an Atom feed for it, and then confuse their help system with an abstract philosophy of feed parsing bug report, but I certainly wouldn’t mind if actual Blogger users wanted to say ”my atom:summary is weird, could you please fix it?”

Comment by Sam Ruby #
2004-03-13 13:54:40

I’m not certain I would classify this as a bug report. In any case, there arguably are two bugs… one is that Blogger is producing feeds that you feel aren’t as usuable as they could be, and the other is that nothing in the atom specifications provide any guidance that would discourage summaries such as these.

This is clearly a subject you have an opinion on, and you certainly are an eloquent writer. Why not propose some wording to be inserted into the spec which will provide guidance to feed producers?

Comment by Phil Ringnalda #
2004-03-13 15:49:26

Why not? Because bug reports, formal or informal, are easy: ”you’re doing this, it’s bad this way, I want this instead”, done. Spec writing is hard. Are there situations other than Blogger’s, where xml:space="preserve" is good and right? What would happen if instead those people put the summary in a <pre>? Does <pre> have security issues? Is it possible to prohibit use of ”preserve” in an XML application? How is it supported on various XML parsers, anyway? Are there likely interpretations of type="text/plain" that would cause someone with a tag-stripped summary to need to avoid it? What is a reasonable wrapper for actual application/xhtml+xml? Is it really better to use a body element that will have to be stripped, rather than assuming the consumer will want a div? Is XHTML actually allowed absolutely everywhere, as it seems? Where was that decision made, and was it a good decision?

You tell me what I think, and I’ll write it.

Comment by Phil Ringnalda #
2004-03-13 20:39:01

After several hours in the More on mode= thread, I now know even less than when I started. I’m not sure whether consensus was reached on what the modes mean, or that they shouldn’t exist, or that they should have defaults that may be over-ridden. I’m not sure whether it’s really a bizarre way for a document to tell an aggregator’s parser what to give to the aggregator’s application, or a way to hint to the application about what path an element ought to take through the parser. Maybe a little time with Pinguxtreme will clear things up.

Comment by Sam Ruby #
2004-03-14 03:39:09

Consensus was not reached. Amusingly, Tim convinced me that the mode attribute was not necessary while I convinced him that the mode attribute was. We definately need to revisit this.

Comment by Steve Jenson #
2004-05-20 12:09:43

One reason you should have filed a bug report is that I didn’t see this until 7 weeks after you posted it. I could have fixed this 7 weeks ago.

Regardless, I think that text/plain and removing xml:space from summaries is the right way to go. Thanks for keeping me on my toes.


Comment by Phil Ringnalda #
2004-05-20 20:46:53

Bloody hell. I was going to go back and create a test blog and file bugs on it, but trying to figure out the sense of mode and type, and this and that, and what was I saying again? I filed one on something else, the other day, and didn’t even have a mental twinge that I was supposed to be remembering something else.

Sorry about that.

Comment by steve jenson #
2004-05-24 14:47:13

Cool, no problem. It will be taken care of.

Trackback by Musings #
2004-03-09 06:51:25

<link rel="pgpkeys">, Sean Carroll and Atom

<link rel="pgpkey"> is catching on. Here’s how to make it better. Sean Carroll has a blog. Too bad I can’t syndicate his feed.

Trackback by phil ringnalda dot com #
2004-07-01 22:28:38

Ah, sweet irony

A little gentle fisking of’s not-well-formed Atom feed, and a question: who’s actually in charge of Meerkat these days?

Name (required)
E-mail (required - never shown publicly)
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.