First, the glowing vision: I take all the RSS 1.0 feeds from my circle of weblogs, plus RSS feeds of their comments, plus FOAF files that tell me that this nickname and this name go with this homepage and this email address, and I parse them all, throw them in a database of RDF triples, and then use it to answer interesting questions: “What did Shelley have to say today, in her weblog and in her comments and in other people’s comments?” “Who talked about FOAF and RSS today, and whose weblogs did they leave comments on?”
Then, the crushing reality: although there’s nothing to say that an RSS feed has to change, RSS-as-it-is-used is about change. It’s used to say “here are the last (some number) of things I wrote,” possibly including the whole thing, possibly just an excerpt or a description of it. That’s what it’s good at, and why it’s popular: checking an RSS feed for new content is a lot easier than visiting a webpage and scanning for new stuff, especially if your RSS reader remembers what you’ve seen before. If it isn’t changing, and pointing out what’s new, then it’s just another site map or list of keywords for a search engine, and while that may be useful, it’s not something that helps or interests me.
RDF-as-it-is, however, seems to be all about static descriptions. At least with all the tools I’ve tried or looked at, when you parse an RDF document, you give it a unique name, and parse out all the triples and store them with the name of the document. If you have a “friends.rdf” file, you parse it and say that the triples came from “friends”. If later on you make two new friends, and one old friend turns into an enemy, you either delete all the triples from “friends” and then reparse an edited version, or you delete the enemy and parse a new file “friends092002” that only includes the new friends. If it was important to you to know that John was a friend in August but not in September, or that Mary’s phone number was one thing for three months but then changed back to what it was before, you might parse your whole friends file once a month, giving it a new name for that month, so that when you want current data you query only “friends092002”, but when you want all of Mary’s phone numbers you query all the docKeys. But, you sure as hell wouldn’t parse it once an hour, storing 8760 duplicates of Mary’s name, 8760 duplicates of her address, 8760 duplicates of her city, 2250 duplicates of her summer phone number, and 6510 duplicates of her regular phone number for every year.
I naively thought that I would just check to see whether or not you were telling me something that I already knew, and if you were I would ignore it, unless you told me something different about the same thing, in which case I would change it. You tell me that the title of http://weblog.burningbird.net is Burningbird 8760 times a year, and every time I say “yup, got that.” If you then tell me that the title of http://weblog.burningbird.net is Singed Feathers, I say “okay, I changed it.”
That kind of thinking works just fine when you are parsing RSS as XML, and you know exactly where you are and what elements mean, and when you see /channel/title/ there’s one and only one. Parsing RSS as RDF, things are different. You know that <http://weblog.burningbird.net> <http://purl.org/rss/1.0/title> is “Burningbird”, but right at that moment, you don’t know what <http://weblog.burningbird.net> is, just that its title is Burningbird. You can find out what it is, by stopping in the middle of parsing to say “what is the <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> of <http://weblog.burningbird.net>?”, but if you do that you’ve just invented a really slow and awkward way of parsing XML. If you have to know what a statement means when you are parsing and storing it, rather than only when you are using it as part of a query, then RDF probably isn’t what you want to use. However, if you don’t know what a statement is saying, you don’t have any way of knowing whether it’s a change, or a repeated element. There was a flurry of words on RSS-DEV about whether or not elements could be repeated and which and how, but I don’t think it ever came to anything. In the real world, and especially once you step outside RSS, elements will be repeated: people will have more than one homepage, weblog posts will have more than one subject, and I’ll end up only remembering the last one I saw.
The advantages of RDF come from statements that stand alone. One of the foundations of RDF is that “anybody can say anything,” and though it doesn’t get mentioned as often, they can say it any damn way they please. There’s nothing magical about <dc:creator> as a way of expressing who wrote a weblog item, other than the understanding that if you use something else fewer people will know what you are talking about. I’m perfectly free to make up my own namespace, and then say that for an item or a channel <xsfs:thunkItUp>Phil Ringnalda</xsfs:thunkItUp>. If I then write up a schema for xsfs, saying that:
<rdf:Property rdf:about=”http://example.com/xsfs/thunkItUp” rdfs:label=”Thunker” rdfs:comment=”The one who thunk of it.”>
<rdfs:subPropertyOf rdf:resource=”http://purl.org/dc/elements/1.1/creator” />
<rdfs:isDefinedBy rdf:resource=”http://example.com/xsfs/” />
then not only am I not being bad, just making stuff up, I’ve done the community a service, by extending the vocabulary. That was the whole reason that RSS 1.0 was created, with namespaces and RDF, so that it could be easily extended by anyone. Now we aren’t restricted to just saying <dc:creator>, we can say <xsfs:thunkItUp> and <xsfs:writItDown>, and a schema-aware toolkit will say “Oh, you want to know things that Phil is the creator of? Well, here’s some stuff that he thunk up, and that’s just a more specific form of creating.” Needless to say, there’s no schema-aware RDF toolkit for PHP, which leaves me having to know everything about every element that you might use. If I want to know “who” things, I have to know that “who” might come in <dc:creator>, or <dc:contributor>, or <foaf:Person>, and the name of a <foaf:Person> might be a <foaf:name> or maybe the combination of a <foaf:firstName> and a <foaf:surname>, or maybe they don’t admit their name, just a <foaf:nick>. The answer to a “who” question ends up involving twenty queries to the database, and a bunch of fancy footwork with the resulting arrays.
There is an option that would make it all work: if I say “to be in my tool, you may only use these elements in these places with this nesting to express these concepts, and you must use this, this, and this, your FOAF file must include a <foaf:name>, and whenever you comment you must use either your <foaf:name> or one of your <foaf:nick>s.” Which would be exactly the same thing as RSS 0.9x/2.0, only with me as dictator and a much more complicated syntax.
So, the answer to the question that started this whole project, “what is the RDF in RSS 1.0 good for?” is two things: it’s good for someone who has an infinitely large database that can be queried infinitely fast by a schema-aware program, or it’s good for writing a schema-aware aggregator that can try to figure out what it should do with new elements that it hasn’t seen before. That’s actually an interesting project with some potential for success, but at this point I’m sick of the whole thing, so I’ll leave that project for someone else.
Some good things did come out of this, though: I learned a whole lot about RDF, and thought more than I had about whether or not RSS is where I want to put my RDF efforts, and best of all it may have helped inspire some people to make comment RSS feeds available, so I don’t have to remember to keep going back to look for replies. Thanks for all your support, and I’m sorry I couldn’t manage do anything really cool with it.