The language thing and the date thing

The other day, I was all ready to say that Movable Type’s default template should switch to using pubDate with an RFC822+ date rather than dc:date in the RSS 2.0 template. In fact, I even wrote a Pascal string challenge entry about it (though I can’t shut up, so I would have either had an extended entry, or a long, long comment, too). But, talking and thinking in Rogers’ comments, now I’m not so sure.

The nonpolitical reason generally given for using pubDate rather than dc:date is to support dates for things which don’t know to expect a dc:date in RSS 2.0. That’s a tiny bit disingenuous, since item-level pubDate only dates back to the introduction of RSS 2.0 in the fall of 2002, while dc:date has been in use in RSS 1.0 since December 2000. But, if you postulate some program that was revised after 2.0, but by someone who never imagined anyone would use dc:date in 2.0, in code that parses completely separately based on the root element, I guess it’s possible. Or, there’s the usual, sigh, political parsing.

Anyway, I was thinking about suggesting that they switch. The usual arguments that come up when the two dates are discussed are, quite frankly, not very deeply thought out. Programmers like to discuss which date format, RFC822 or W3CDTF, is easier to sort, or easier to parse. So what? You have to be able to parse them both, because you’ll see them both, so you convert them both into whatever date format you are using internally, and then you’re done. There is absolutely no situation where the topic is “when I completely control the world of syndication and can mandate a single date format in every single document” so discussions of parsing are complete nonsense. For extra credit, programmers like to try to mandate that dates be in UTC, rather than in whatever timezone suits the person producing them. If you can parse a date, you can parse a date with a timezone, and convert it to UTC or whatever else you want. The only person who has any reason to prefer any timezone over any other is the person producing the date, who might recognize that his date is wrong if it’s in his own timezone, and almost certainly won’t if it’s in UTC (except for those lucky few who spend half the year in UTC).

But then, the discussion in Rogers’ comments came up, and got me looking at Brad Choate’s “Non-Funky MT RSS 2 Template” for the first time in months. What I noticed was that Brad was correctly including the language="en" attribute on the date-related MT tags, so that the weekday and month name would be in English, as RFC822 requires, rather than in the language set for the weblog. That’s what it takes to produce a valid RFC822 date in Movable Type, but, that’s going to be right below a <language>en-us</language> element in the template, where English is hardcoded in because Movable Type doesn’t actually know what language code to use for a weblog (and the question of precisely which code from Netscape’s list, RFC1766 or RFC3066, ISO-639-1:1988 or ISO-3166, is best left to people with very good medication).

So, say you are publishing in a language other than English, and you notice that your RSS claims to be in English. You go to the template, see en-us and change it, and see language="en" and change that as well. Now you have an invalid RFC822-ish date like nel, 06 mai 2004 03:37:05 -0800 which no aggregator can read.

If we have to have pubDate, can we please have <$MTEntryRfc822Date$> and <$MTRfc822Date$> tags in the core, so we won’t go from an undetermined number of aggregators that won’t read dc:date to an indeterminate number of feeds with invalid pubDates? Or, we could just stick with dc:date, which not only doesn’t give anyone the opportunity to think they should use their own language for textual parts, it also doesn’t give anyone the chance to think that because I get to use PST and PDT for my timezone, that they can use CET or BST. That’s the real advantage of W3CDTF: it’s not that it’s easier to parse or easier to sort or easier to do anything else that you have to do both ways, it’s just that it has fewer moving parts in one particular language. Get past the hump of realizing that the “T” is supposed to be a literal letter in the middle of the datetime, and you can hardly screw it up.

7 Comments

Comment by Mark #
2004-05-06 05:58:13

If you want to see an example of how badly people can mangle RFC822 dates, check out http://nanaseven.onblog.com/rss/rss.jsp?blog_uid=17387. The pubDates are in Hangul (the Korean alphabet).

Anyone who says that RFC822-style strings are easier to parse *or* easier to generate is being hopelessly English-centric.

Comment by Roger Benningfield #
2004-05-06 08:19:39

Mark: Or they’re simply reporting the reality of their tools. I can feed an 822 date to ColdFusion’s ParseDateTime() function and get back a native date… but it’ll puke on W3CDTF.

Macromedia will hopefully address the issue in CF7, but for now, I have to jump through hoops. Personally, I consider the whole thing a bug, since CF’s WDDX support *can* handle W3CDTF… but Macromedia apparently disagrees, or at least doesn’t see it as a high priority thing.

Comment by Phil Ringnalda #
2004-05-06 08:43:03