phil ringnalda : TrackBack and validation summary

TrackBack and validation summary

While considering adding “Related:” links to my various posts on TrackBack and XHTML validation, I discovered that I don’t seem to have ever actually posted the current semi-official best practice / lesser of evils solution. So, here’s an executive summary of the possible solutions, leading up to the current situation:

The problem: TrackBack inserts a section of RDF into your page, to identify the url for people to ping for each post. The W3C validator doesn’t like finding a section of RDF in your XHTML, becoming so incensed that eventually it can’t even say what it objects to, falling back on a string of Error: element "rdf messages.

The ideal solution: since XHTML is an application of XML, it would be nice to just do what you would do in any other XML file, add namespace declarations for rdf and dc, and then have the validator only validate tags in the xhtml namespace. Sadly, the validator doesn’t work that way.

A possible solution: XHTML Modularization, intended mostly to allow developers of cellphone browsers and the like to support a subset of XHTML, also allows you to write a DTD which extends XHTML by adding new elements. Sadly, the validator doesn’t actually read DTDs and validate based on them.

A funky solution: I wrote a Movable Type plugin which hides the RDF from the validator by inserting it in the page with a series of Javascript document.writes. While that method does work, in that it makes the validator happy while still letting the Movable Type bookmarklet find the TrackBack urls to ping, it was actually based on a misunderstanding of how the bookmarklet finds the RDF: I was assuming that the bookmarklet used Javascript to parse the document source, when in fact it just passes the url to a Perl script on your server, which gets the source and uses a regular expression to look for something that looks a bit like RDF. The current MT regexp does find the RDF hidden in document.writes, but there’s no reason to assume that all future third-party implementations will, so it’s probably not a very good solution.

A better, but troublesome solution: as an alternative to combining XHTML and RDF in a single file, you can associate the RDF in a separate file with your XHTML, by including a <link rel="trackback-rdf" type="application/rdf+xml" href="foo-tb.rdf" /> tag in the <head> section of the page. The MT bookmarklet could then pass the url from the link tag if it finds one, or pass the url for the page if it doesn’t, to support third-party implementations for things like Blogger that can’t generate arbitrary external files. This solution would be dead simple to implement for individual entry archives, easy enough for other types of archives that might be the target of permalinks, and nearly impossible for main pages. Though it’s hard for non-coders to grasp, MT has absolutely no idea what posts will appear on your main page until after it actually creates it. Creating a single external RDF file would require that you create a separate template that does the same sort of limiting you do on your main page (as simple as <MTEntries lastn="10"> or as complicated as five separate <MTEntries category="Foo AND Bar" lastn="n"> tags, and don’t forget to change it when you change your main template). As an alternative, you could generate a separate RDF file for each individual entry, with a <link> tag for each, but in order to put the <link> tags in the <head> of the page, you would still need to know which entries will appear before the page is generated, which would require completely rewriting the way MT parses templates.

A workable-for-now solution: although it completely negates any value of using RDF, simply enclosing the RDF in HTML comments with  will hide it from the validator while still allowing the current MT implementation to find what it needs from the RDF. The TrackBack data is encoded as RDF-in-XML so that in some utopian Semantic Web future, a program can just use an XML parser to easily discover what your page has to say about itself, but XML parsers may (and do) remove any comments before they start to parse, so commenting out the TrackBack data makes it invisible to parsers. As an alternative, you can hide it with a CDATA section (<![CDATA[ rdf goes here ]]>), which makes it parseable (in a very awkward way: parse the file, grabbing the CDATA sections and saving any that are RDF, put them together, and then reparse that), but requires either a hack or a plugin, since the current code for <$MTEntryTrackbackData$> inserts a newline after the last line of the RDF, and the CDATA end tag can’t be on a new line by itself. So, today’s best practice is just to use HTML comments, even though that means that it could have been written to get exactly the same effect from just using , since MT isn’t using the RDF as XML, and the other data is ignored, or in the case of the dc:identifier, only correct if you use individual entry archives without any anchor in your permalinks.

An iconoclastic solution: according to the RDF Working Group, the best solution to embedding RDF in XHTML is to just do it, validation be hanged. However, having the validator show forty or fifty RDF-related errors wipes out the primary benefit of validation: if you start with valid XHTML, then when something goes wrong, you can use the validator as a quick check of how you screwed up. For example, I use the validator mostly to check for unescaped &s in pasted in urls, because having & rather than & in a url will break an RSS 0.9x feed that includes your XHTML. I don’t have any problem with having my server deliver pages that are XHTML invalidated by including the unknown tag <rdf:RDF>, because I know that nobody is foolish enough to write a browser that does anything other than ignore unknown tags, so my only goal for validation is to have the validator tell me about errors I don’t know about. So, I just wrote a quick PHP script that reads my main page (with the PHP in it interpreted), and uses a regular expression to remove the RDF, and now my link to the validator checks that page rather than the actual main page. Got PHP?

<?
$fp = fopen("http://www.philringnalda.com/index.php", "r");
$html = fread($fp, 10000000);
$html = preg_replace("/<rdf:RDF.*?\/rdf:RDF>/s", "", $html);
echo $html;
?>

And a link to http://validator.w3.org/check?uri=http://www.philringnalda.com/index-no-tb.php rather than http://validator.w3.org/check/referer (and the cognitive shift to thinking of validation as a means, not an end), and you’re set.

<update>Got PHP and someone as sharp as Brad Choate around? Replace  with:

<?php
if (!strstr($_SERVER['HTTP_USER_AGENT'], 'W3C_Validator')
    && !strstr($_SERVER['HTTP_USER_AGENT'], 'WDG_Validator')) { ?>
<MTEntryTrackbackData>
<?php } ?>

in every MT template where you want TrackBack RDF, and then when either the W3C validator or the very nice Web Design Group validator comes calling, they get an RDF-free version of the page. Thanks, Brad!</update>

This entry was posted on Saturday, August 24th, 2002 at 2:58 pm and is filed under trackback. You can follow any responses to this entry through the post feed. You can skip to the end and leave a response. Pinging is currently not allowed.

19 Comments

Comment by Brad Choate #

2002-08-24 22:49:30

Got PHP? How about this (in your MT template, naturally):

<?php if (!strstr($_SERVER[’HTTP_USER_AGENT’], ’W3C_Validator’)) { ?>
<MTEntryTrackbackData>
<?php } ?>

Same idea as your ’index-no-tb.php’, but with only one file.