Rolling your own CDATA

Lisa (and to a lesser extent Brad) had a bit of trouble with their various and sundry RSS feeds today, thanks to the way that the new version of encode_xml in MT 2.6 wraps things in a CDATA section rather than using entity encoding, combined with the way that some readers (including but not just limited to Radio) don’t handle RSS with an element that has some CDATA, then some entity escaped PCDATA, then some more CDATA, then maybe a bit more PCDATA.

If you also have an RSS template that puts more than one MT tag into an RSS element, you’ll need to make the same fix they have, taking the encode_xml=”1″ filter out of the MT tags, and adding your own CDATA tag in the template, and even if you are them, you need a plugin to make your feed truly safe, so don’t stop reading yet.

If you had a template with something like

<description><$MTEntryBody encode_xml="1"$>
<$MTEntryMore encode_xml="1"$></description>

then your first step is to add your own CDATA escaping in the template, and stop MT from trying, with

<description><![CDATA[<$MTEntryBody$>
<$MTEntryMore$>]]></description>

which will work just fine for the most part* until the moment that you are casually discussing the use of CDATA escaping and forget to entity encode the > when you say “and ends with ]]>” (or, much more likely, someone who thinks they are funny sticks an unencoded one into your comments, and thus into your comment feed). At that point, your CDATA section ends, and everything after that is invalid XML, right up until the now-invalid extra CDATA end tag. To avoid that, you need a global filter that will just escape that one tag, converting ]]> to ]]&gt;, which is why I wrote my most trivial plugin yet, available here. It adds a global filter, cdata_escape, so that if you have <$MTCommentBody cdata_escape=”1″$>, any ]]> in CommentBody will be escaped, and since that’s too tiny a function for even me to release on it’s own, also adds a filter named smartest_encode_xml** which does exactly what encode_xml used to do: apostrophes, quotes, less-than and greater-than are converted to their XML entity equivalent, and nothing more. Use it when you know damned well that nothing else needs to be escaped, and you don’t want to bother with a call to encode_xml with its check for HTML before it escapes one apostrophe.

* Putting encoded HTML in description is a bad, bad thing that you shouldn’t do, you should put it in <content:encoded> instead: see ten thousand discussions all over the web. Also, if you really must put encoded HTML in the description element, using CDATA rather than entity encoding will break any reader that uses regular expressions rather than a real XML parser, unless it’s very new and thus knows to expect the possibility of CDATA escaping. But too bad for them; I know you’re going to just do whatever you want anyway. Bad you.

** A jab, meant in the kindest way and the best of humor, at Timothy Appnel’s mt-xml-smart plugin which became the current MT encode_xml code.

9 Comments

Comment by Lisa #
2003-02-15 22:18:14

Thanks for the plugin. I have the encoded HTML in the content:encoded tags as well, but Newzcrawler seems to pick it up from the description tags. Thanks for all of your help!

 
Comment by Phil Ringnalda #
2003-02-16 00:13:00

Heh. I’ll grant you that mt-entrybodymore is a slicker solution to that problem, but what about her comment feed? Are you going to do mt-commentbody-p-b-i-a-commentauthorurl-commentauthorname-b-commentdatetime too?

 
Comment by Phil Ringnalda #
2003-02-16 01:08:17

OK, I see that (roughly speaking), you are! Good on you.

Comment by Alexei Kosut #
2003-03-03 10:37:03

It seems like a neat idea, but I couldn’t get <MTEntryIfExtended> to work right inside an <MTXMLEncode> container. So I wrote an even more general tag, MTBlock. Only one line of Perl, but it lets me do general lexical scoping in my Movable Type templates, which I think is cool.

Now I can write <MTBlock encode_xml="1"><MTEntryBody><MTEntryMore></MTBlock> and what you’d expect to happen does.

I can also use it for more general things like applying filters to static text, and it’s really handy for writing out HTML in static MT text. e.g., <MTBlock encode_html="1"><a><b><c></MTBlock> shows up as <a><b><c>. That saves a lot of having to write &lt; and &gt; by hand.

I would write it up and put it up for more ”official” download, but for one line of code (and 14 lines of comments), it doesn’t seem quite worth it.

 
 
Comment by Simon Fell #
2003-02-20 20:18:56

Phil, which version of Aggie did you test this on ?, i just tried with RC5 and it handled mixed PCDATA/CDATA sections fine.

 
Comment by Phil Ringnalda #
2003-02-20 20:45:19

I’m not quite sure: I seem to have her comment feed in both RC4 and RC5, so I would have thought that I tested both, but I really can’t remember anymore. It seems like I would have made a point of saying that RC4 didn’t like it but RC5 did if that was what I saw, since after all this time of saying ”Aggie doesn’t do x” only to hear that it does in CVS, I was awfully glad to get a new release. Or, there may have been something else about the feed that Aggie also objected to – I thought about building myself a clean testing feed, but then I didn’t bother.

 
Trackback by tima thinking outloud. #
2003-02-15 23:55:07

New Plugin: mt-entrybodymore.

A MovableType plugin that output both the entry body and ”more” fields with one tag because you may find a need for it.

 
Trackback by markpasc.org #
2003-02-16 15:16:47

MT2.6

Ripped from the changelog.

 
Trackback by Glimpse of a Grrl #
2004-01-30 16:02:54

Fixing RSS with MT2.6

Update: Phil Ringnalda posted a plugin that takes care of encoding CDATA tags (and some other stuff) to prevent accidental RSS feed breakage. Update 2: Tim Appnel wrote 2 plugins to fix my issues too. The first one is mt-entrybodymore…

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.