Thinking about RSS

Now that Steve’s RSS feed is usable (I’m still not clear on whether putting the posts in a CDATA section is legal-but-unworkable or illegal), it’s time to chew on the details.

He’s using the anchor in the main page (e.g. http://saladwithsteve.com/#75295585) rather than the permalink in the archives for the <link> element. At first, that seems like a really good idea to me: I’m usually reading RSS because I want to stay right on top of a site, so it’s a safe bet that a post is still on the main page, and then I’m on the main page and sure that I’m seeing all the recent posts, rather than following the link for a Saturday post and then having to go to the main page to see Sunday posts. However, I don’t think that translates to general release very well. For one thing, in general there’s no guarantee that a post in the RSS feed is still on the main page (even if the feed is generated just from the main page, someone can be reading their week-old version). Worse, as I remember it, when you post from the aggregator in Radio, you automatically link to the <link> element, so their links will be rotting rather quickly.

Two minor quibbles that I’m sure Steve has already spotted from the validator: <channel rdf:about and <image rdf:about, not rdf:resource.

<edit class=”philsanidiot” />

8 Comments

Comment by Hossein #
2002-04-12 23:37:26

You’re right; you have to use &lt; and &gt; (according to the spec). But it looks like they have already modified the feed to comply with this requirement. They should probably limit the description field to 500 characters though.

 
Comment by Phil Ringnalda #
2002-04-13 07:55:13

Or maybe did from the start: I have this nasty feeling that I might have forgotten to view source. Can’t remember now.

Whether there should be markup encoded in the description at all in a 1.0 feed is another matter. Certainly there’s nothing wrong with having &lt;b&gt; in there, since that’s all #PCDATA, but all the readers I have treat that as an HTML bold tag, which strikes me as the wrong interpretation. To me, RSS 1.0 is a real fork, to an ”RDF Site Summary” – just a teaser with no markup (except in mod_content, which nothing seems to read), while RSS 0.92 is ”Really Simple Syndication” – a way of creating an XML feed of the entire content. Absent a reader that can deal with mod_content or use all the Dublin Core metadata, to me that makes the choice between 1.0 and 0.92 a real choice: if you write long, mostly unlinked entries like Mike, you probably want a 1.0 feed with just the first 500 (or less) characters and no markup, to tease people into coming to read the whole entry. If you do a classic ”link and a snarky comment” weblog, you probably want an 0.92 feed with the markup included, since there’s no reason for people to have to load your whole page just to follow one link. Unless, of course, you see your posts as just the starting topic for your comments, where the really good content lives, like I do. Hard choice between the two for me, but unless I could stick the comment link into the feed, I’d probably go with 1.0 without markup, just to force you to come see my comments (I remember some Radio user wanting Jonathon to take out his comments, because by only reading Jonathon in the Radio aggregator, never seeing the comments, he was missing lots of the good content, so he wanted to force people to make comments in their own weblogs where he would see them in RSS. Seems to miss the point, unless you are looking at only getting comments from a closed circle of people whose feeds you already subscribe to).

 
Comment by Hossein #
2002-04-13 18:03:39

For some reason, I always thought that description elements of RSS 0.92 feeds were required to be below 500 characters, but after reading it again, it looks like you’re right. 0.92 does not place a limit on the lengths of elements. 1.0 suggests a limit of 500 characters, but this is optional.

But I think that converting <b> to the HTML equivalent is acceptable behavior of RSS feed readers. It’s very likely that I could be wrong, but I think of it this way: < represents < in XML (because PCDATA is parsable), so when your parser sees a < it should parse it as For some reason, I always thought that description elements of RSS 0.92 feeds were required to be below 500 characters, but after reading it again, it looks like you’re right. 0.92 does not place a limit on the lengths of elements. 1.0 suggests a limit of 500 characters, but this is optional.

But I think that converting <b> to the HTML equivalent is acceptable behavior of RSS feed readers. It’s very likely that I could be wrong, but I think of it this way: < represents < in XML (because PCDATA is parsable), so when your parser sees a < it should parse it as <. If you want to represent < then you can use &lt;. This method allows feeds to represent < and > where needed, without accidentally corruping the feed by having </description> somewhere in a description element. One solution is to strip the tags from the feed altogether. That's acceptable, but if your RSS reader is fancy enough to be able to parse the tags, then why not let it?

 
Comment by Hossein #
2002-04-13 18:06:04

Oops. Your preview coverted all of my escape characters back into tags :)

Here’s what I meant to say:

For some reason, I always thought that description elements of RSS 0.92 feeds were required to be below 500 characters, but after reading it again, it looks like you’re right. 0.92 does not place a limit on the lengths of elements. 1.0 suggests a limit of 500 characters, but this is optional.

But I think that converting &lt;b&gt; to the HTML equivalent is acceptable behavior of RSS feed readers. It’s very likely that I could be wrong, but I think of it this way: &lt; represents < in XML (because PCDATA is parsable), so when your parser sees a &lt; it should parse it as <. If you want to represent &lt; then you can use &amp;lt;. This method allows feeds to represent < and > where needed, without accidentally corruping the feed by having </description> somewhere in a description element. One solution is to strip the tags from the feed altogether. That’s acceptable, but if your RSS reader is fancy enough to be able to parse the tags, then why not let it?

 
Comment by Phil Ringnalda #
2002-04-13 18:28:02

Isn’t that nasty preview behavior? I’m used to it, so it doesn’t bite me very often, but I still should fix it.

I’m mostly arguing a philosophical point, but still I think it’s a valid point: 1.0 and 0.92 are designed to do different things. It’s a good fork, and I don’t think Blogger should glue them back together by doing a 1.0 feed with 0.92 content. Yes, it’s perfectly reasonable for a reader that finds a bold tag in a 1.0 feed to display the content in bold, but it’s perfectly acceptable to write one that displays <b> instead, so I don’t think it’s the right thing to do to include markup in a 1.0 feed. If you want entity-encoded HTML, and the full, more than 500 character post, then you want an 0.92 feed. If you want a summary of 500 characters or less, without markup, then you want a 1.0 feed. If you want full Dublin Core data, a summary, plus the full content with markup, then you want a 1.0 feed with mod_content, and you better be prepared to write a reader to use it, because there doesn’t seem to be one at the moment.

My solution? Generate both a 0.92 feed and a 1.0 with mod_content feed, either as a one-or-the-other choice, or both at once. The 1.0 feed will actually be better for some styles of blogging, plus it will be there, ready and waiting for something that can use it fully, while the 0.92 feed will work the way most people currently expect RSS to work.

 
Comment by steve #
2002-04-13 18:51:24

CDATA sections are valid in areas where PCDATA is required. they are workable but people are too lazy to put them into their readers. I’m not sure why that is. It seems the obviously better solution over entity encoding markup and then decoding it back.

A few things:

The 0.92 feed does limit to 500 characters and strip out the HTML.

Stripping out the HTML might be user configurable. I think it’s a good idea.

I don’t want to generate both types of feeds since they contain the same content. I’ll keep using a 1.0 feed, and I’ll even write my own syndication format which is just s-expressions, all content, no markup. I’ll feel better about this whole syndication mess then. Nobody will be able to read my feeds and I’ll be fine with that. ;-)

 
Comment by Phil Ringnalda #
2002-04-13 19:30:39

Mess? Hell, it hasn’t even started getting messy yet.

If you are limiting to 500 characters and stripping HTML, then that’s an 0.91 feed, not 0.92. 0.91 was Netscape’s ”summary for our portal” format, that completely sucked for weblog syndication. 0.92 is Dave Winer’s ”better for weblogs” version with no limits on item length or number of items, and positive approval of entity encoded html.

The biggest reason I’m arguing this is that I expect any Blogger feature to have to last unchanged for a couple of years. If it doesn’t, that’s great, but past experience (<P>) shows that it may. If you offer the choice of 0.92 with the full item and markup, or 1.0 with a truncated plain-text description and the full item with markup in <content:items>, then you’ve got today and 2004 covered.

 
Comment by Hossein #
2002-04-13 23:15:46

Ok Phil, I understand now. Because RSS 1.0 is designed for summaries, the description should be kept short and free of formatting. Sorry you had to explain it twice!

I think that Userland should probably rename their upcoming RSS 0.93 to something else because 1) they are eventually going to run out of version numbers and 2) 1.0 and 0.92+ have such different purposes that they shouldn’t share the RSS name. This is where I was originally confused; I knew that 0.92 was developed independently by Dave Winer, but I didn’t read the 1.0 spec close enough to realize how different the two actually were.

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.