Your bread hurts me: Atom text constructs revisited

Uche Ogbuji‘s XML.com column this month, Handling Atom Text and Content Constructs, is mostly a nice introduction to dealing with the Atom elements that hold the text you actually see, in titles, summary, content, and a few other things, with one absolutely horrifying mistake. After showing examples of plain text titles, he says

[…] and you should not even have tunnelled markup through encoding. Atom does not strictly prohibit the form in listing 3, but it does violate the spirit of the specification.

Listing 3: Bogus (unsignalled) encoded markup in plain text construct

<title>One &lt;strong&gt;bold&lt;/strong&gt; foot forward</title>

Nooooooooo! No, no, no!

That is an absolutely, positively, perfectly valid Atom title, which should be rendered as One <strong>bold</strong> foot forward. It isn’t prohibited, it doesn’t violate the spirit of the spec, it is the spirit of the spec. The only thing that violates the spirit (and the letter) of the spec is the idea of deciding, based on the sins of the past, that any instance of &lt; means that it is markup, no matter what the author said that it is.

Doing that is like deciding that any sentence which contains the word “pain” is in English, whether or not the author is a French breadmaker writing about baking, with every other word having no meaning in English, in a sentence marked up with <p xml:lang="fr">.

An Atom <title> or <title type="text"> is text: there is no escaping, no tunnelling, no markup, it is text. To treat it as anything else, to not escape it before handing it to an HTML renderer, to strip things which would be unsafe or undesirable if it was the HTML which it is not, is a bug. No negotiation, no wiggle room, no examples of producer error that make you need to do it, either you display that example as One <strong>bold</strong> foot forward or you are not just wrong, you are destroying the very reason Atom exists.

9 Comments

Comment by James Holderness #
2005-12-08 21:19:41

Just a small correction to your correction. I’m assuming you should at least have unescaped the entities and converted them to angle brackets since that is part of the XML escaping. It’s easiest to think of in terms of a CDATA section where, in text mode, you would display exactly the text that is included. You wouldn’t suddenly start escaping certain characters before displaying the content as plain text.

Comment by Phil Ringnalda #
2005-12-08 21:52:42

Arrrrgh!

Thank you. I was so carefully looking for any place where I under-escaped that I didn’t even see where I was over-escaping. I do that when I’m correcting people’s grammar, too: try as I might to not do the foolish thing, and make a grammatical error of my own in my correction, I always do.

 
 
Comment by Lachlan Hunt #
2005-12-09 04:44:34

That doesn’t make sense, why would those entity references not be converted to the appropriate character. They are the predefined XML entity references, and unless I misunderstand XML, they should get converted by the parser. I expect that title to get rendered as:

One <strong>bold</strong> foot forward

 
Comment by Uche #
2005-12-09 06:28:51

Artistotle P. pointed this post out to me. In our exchange in comments to the article you admit that you’ve gone over the top in stating your position. Boy did I not have any idea how way over thetop you’d gone. This Weblog posting of yours is one of the worst examples of a straw man argument I’ve ever seen, and I do not appreciate your affixing my name to your straw man. No one said that an escaped less-than in text must be interpreted as markup, and I can only conclude that you hopelessly misread my article before claiming such rot. There is no reasonable interpretation of my article that sounds anything like your quite galling caricature. Luckily people can go to the commments on the article and read our exchange on the matter in the comments.

Comment by Robert Sayre #
2005-12-09 06:53:26

Uche, you have to realize Phil is a little sensitive about these things. I would be too, if I had to Atomize a gigantic bug tracker full of bugs about markup. :)

 
 
Comment by Uche #
2005-12-09 07:26:37

Robert, fair enough. The infuriating thing is that I think that Phil and I are on the same side of the essential issue, and that he’s using such an unnecessary characterization to in effect make a Talmudic argument that will just confuse others to no end. We both agree that Atom processors should never second-guess the meaning the type attribute, and he just confuses things by claiming that I do not agree with this. Obviously in my example I was *meaning* markup, and thus my Atom was wrong because it did not match my meaning. If I had said that ”100 < 1000” was wrong, or that it violated the spirit of the spec, Phil would certainly have cause to complain, but although I admit I could have stuck one more clarifying sentence in there, I think that it should be clear to almost any reader what I meant. Phil’s game of ”gotcha” has been very unproductive, confusing users rather than reinforcing the message we’d both like to get across.

 
Comment by Uche #
2005-12-09 08:11:24

On the off chance that someone does try to make the interpretation Phil is dreading, I’ve asked the editor to add a clarifying sentence just before listing 3. I still think that Phil’s is an unlikely interpretation of the original, but we share the same goal of not giving anyone any excuse to wave off the type attribute, and I’d rather clarify at the source than risk confusion.

Comment by Phil Ringnalda #
2005-12-09 08:29:08

Thank you.

 
Comment by James Holderness #
2005-12-09 09:39:58

I’d just like to add that I came to the exact same conclusion as Phil when reading your article and would have commented myself if it weren’t for the fact that I was too lazy to register on xml.com. I’m glad to see it was just a misinterpretation on my part and that everybody is in agreement about the correct usage.

 
 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.