TrackBack and validation summary

While considering adding “Related:” links to my various posts on TrackBack and XHTML validation, I discovered that I don’t seem to have ever actually posted the current semi-official best practice / lesser of evils solution. So, here’s an executive summary of the possible solutions, leading up to the current situation:

The problem: TrackBack inserts a section of RDF into your page, to identify the url for people to ping for each post. The W3C validator doesn’t like finding a section of RDF in your XHTML, becoming so incensed that eventually it can’t even say what it objects to, falling back on a string of Error: element "rdf messages.

The ideal solution: since XHTML is an application of XML, it would be nice to just do what you would do in any other XML file, add namespace declarations for rdf and dc, and then have the validator only validate tags in the xhtml namespace. Sadly, the validator doesn’t work that way.

A possible solution: XHTML Modularization, intended mostly to allow developers of cellphone browsers and the like to support a subset of XHTML, also allows you to write a DTD which extends XHTML by adding new elements. Sadly, the validator doesn’t actually read DTDs and validate based on them.

A funky solution: I wrote a Movable Type plugin which hides the RDF from the validator by inserting it in the page with a series of Javascript document.writes. While that method does work, in that it makes the validator happy while still letting the Movable Type bookmarklet find the TrackBack urls to ping, it was actually based on a misunderstanding of how the bookmarklet finds the RDF: I was assuming that the bookmarklet used Javascript to parse the document source, when in fact it just passes the url to a Perl script on your server, which gets the source and uses a regular expression to look for something that looks a bit like RDF. The current MT regexp does find the RDF hidden in document.writes, but there’s no reason to assume that all future third-party implementations will, so it’s probably not a very good solution.

A better, but troublesome solution: as an alternative to combining XHTML and RDF in a single file, you can associate the RDF in a separate file with your XHTML, by including a <link rel="trackback-rdf" type="application/rdf+xml" href="foo-tb.rdf" /> tag in the <head> section of the page. The MT bookmarklet could then pass the url from the link tag if it finds one, or pass the url for the page if it doesn’t, to support third-party implementations for things like Blogger that can’t generate arbitrary external files. This solution would be dead simple to implement for individual entry archives, easy enough for other types of archives that might be the target of permalinks, and nearly impossible for main pages. Though it’s hard for non-coders to grasp, MT has absolutely no idea what posts will appear on your main page until after it actually creates it. Creating a single external RDF file would require that you create a separate template that does the same sort of limiting you do on your main page (as simple as <MTEntries lastn="10"> or as complicated as five separate <MTEntries category="Foo AND Bar" lastn="n"> tags, and don’t forget to change it when you change your main template). As an alternative, you could generate a separate RDF file for each individual entry, with a <link> tag for each, but in order to put the <link> tags in the <head> of the page, you would still need to know which entries will appear before the page is generated, which would require completely rewriting the way MT parses templates.

A workable-for-now solution: although it completely negates any value of using RDF, simply enclosing the RDF in HTML comments with <!-- <$MTEntryTrackbackData$> --> will hide it from the validator while still allowing the current MT implementation to find what it needs from the RDF. The TrackBack data is encoded as RDF-in-XML so that in some utopian Semantic Web future, a program can just use an XML parser to easily discover what your page has to say about itself, but XML parsers may (and do) remove any comments before they start to parse, so commenting out the TrackBack data makes it invisible to parsers. As an alternative, you can hide it with a CDATA section (<![CDATA[ rdf goes here ]]>), which makes it parseable (in a very awkward way: parse the file, grabbing the CDATA sections and saving any that are RDF, put them together, and then reparse that), but requires either a hack or a plugin, since the current code for <$MTEntryTrackbackData$> inserts a newline after the last line of the RDF, and the CDATA end tag can’t be on a new line by itself. So, today’s best practice is just to use HTML comments, even though that means that it could have been written to get exactly the same effect from just using <!-- The TrackBack url for <$MTEntryTitle$> is <$MTEntryTrackbackLink$> -->, since MT isn’t using the RDF as XML, and the other data is ignored, or in the case of the dc:identifier, only correct if you use individual entry archives without any anchor in your permalinks.

An iconoclastic solution: according to the RDF Working Group, the best solution to embedding RDF in XHTML is to just do it, validation be hanged. However, having the validator show forty or fifty RDF-related errors wipes out the primary benefit of validation: if you start with valid XHTML, then when something goes wrong, you can use the validator as a quick check of how you screwed up. For example, I use the validator mostly to check for unescaped &s in pasted in urls, because having & rather than &amp; in a url will break an RSS 0.9x feed that includes your XHTML. I don’t have any problem with having my server deliver pages that are XHTML invalidated by including the unknown tag <rdf:RDF>, because I know that nobody is foolish enough to write a browser that does anything other than ignore unknown tags, so my only goal for validation is to have the validator tell me about errors I don’t know about. So, I just wrote a quick PHP script that reads my main page (with the PHP in it interpreted), and uses a regular expression to remove the RDF, and now my link to the validator checks that page rather than the actual main page. Got PHP?

<?
$fp = fopen("http://www.philringnalda.com/index.php", "r");
$html = fread($fp, 10000000);
$html = preg_replace("/<rdf:RDF.*?\/rdf:RDF>/s", "", $html);
echo $html;
?>

And a link to http://validator.w3.org/check?uri=http://www.philringnalda.com/index-no-tb.php rather than http://validator.w3.org/check/referer (and the cognitive shift to thinking of validation as a means, not an end), and you’re set.

<update>Got PHP and someone as sharp as Brad Choate around? Replace <!-- <$MTEntryTrackbackData$> --> with:

<?php
if (!strstr($_SERVER['HTTP_USER_AGENT'], 'W3C_Validator')
    && !strstr($_SERVER['HTTP_USER_AGENT'], 'WDG_Validator')) { ?>
<MTEntryTrackbackData>
<?php } ?>

in every MT template where you want TrackBack RDF, and then when either the W3C validator or the very nice Web Design Group validator comes calling, they get an RDF-free version of the page. Thanks, Brad!</update>

19 Comments

Comment by Brad Choate #
2002-08-24 22:49:30

Got PHP? How about this (in your MT template, naturally):

<?php if (!strstr($_SERVER[’HTTP_USER_AGENT’], ’W3C_Validator’)) { ?>
<MTEntryTrackbackData>
<?php } ?>

Same idea as your ’index-no-tb.php’, but with only one file.

 
Comment by PapaScott #
2002-08-24 22:59:39

I was going to say just make a second MT template without the Trackback tags, but I like Brad’s idea better.

 
Comment by Phil Ringnalda #
2002-08-24 23:58:22

Ah, lovely. I thought about user-agent sniffing, but only in terms of using mod_rewrite to direct validators to the separate file. This way suits me perfectly.

Don’t have PHP, and thinking about PapaScott’s separate template idea? Don’t. I started down that path, too, until I realized that testing template changes would involve having to keep both templates in sync: make a change in your main template, see if it works like you expect, make the same change in your validation template, see if it’s valid, back and forth… far better to stick with the comments.

 
Comment by PapaScott #
2002-08-25 02:02:17

Of course, the separate template doesn’t work well if you are constantly fiddling with the index template. It works fine if your template is set, and you just want to check that your entries are OK.

 
Comment by Phil Ulrich #
2002-08-25 05:58:19

Phil, since this is slightly on topic: EntryEditLink now escapes the &’s to &amp;’s. Hope this helps.

 
Comment by michel v #
2002-08-25 15:29:38

I’d rather not hide anything from the validator. After all, it would be like hiding bad markup in javascript document.write statements: cheating ;)

 
Comment by djwudi #
2002-09-03 22:56:18

Hrm…for one reason or another, this seems to refuse to work for me. First I tried typing the code in by hand, then just cut-n-pasting Brad’s code into my site, but no matter what, the W3C validator is still sticking its tongue out at me.

Guess for now I’ll stick with the comment kludge.

 
Comment by Quadsk8 #
2002-09-04 05:24:13

there is missing an open ”(” in the snippet above, it should be like:
if ( strstr() && strstr() ) {}
check that you have for every open a close…

 
Comment by Phil Ringnalda #
2002-09-04 08:32:37

I don’t see the missing open paren:

if ( !strstr() && !strstr() ) {

I just copied that from above and deleted the stuff inside the !strstr()s.

 
Comment by msd #
2002-11-15 04:56:04

Yes, it’s there…

 
Comment by Anonymous #
2004-02-16 11:56:58

Phil, is it possible to get your plugin now?..

Comment by Phil Ringnalda #
2004-02-16 12:07:37

If that’s an actual question, not just comment spam, could you be a little more specific about what plugin, and what link where isn’t giving it to you?

 
 
Trackback by PapaScott #
2002-08-24 23:37:11

An Evil Idea

In Phil Ringnalda’s discussion forum, Brad Choate posted a particularly evil idea for hiding trackback data from the W3C validator. So now my index page looks like it validates, even

 
Trackback by tidak ada #
2002-08-25 15:27:14

work, damn it, work

Sorry Phil.
I’m shamelessly trying to TrackBack ping your entry about TrackBack and XHTML validation. Because it raises good points and because I want TrackBack to work finally.

 
Trackback by The Long Letter #
2002-09-03 23:11:16

Validation fixed (kinda)

One of the stumbling blocks I discovered today about enabling Trackback is that it breaks the W3C Validator – even though I make sure to use valid XHTML 1.0 in my pages, the validator chokes on the RDF code needed for Trackback to work correctly. Lucki…

 
Trackback by GeraBlog #
2003-07-31 06:24:21

Sono valido!

Ho appena finito di sistemare i templates di Movable Type (MT) ed ora le pagine di questo blog sono valide secondo la DTD di XHTML 1.0 Strict. Il mio obiettivo iniziale era di arrivare alla validazione XHTML 1.1. Per prima…

 
Trackback by mashby.com #
2003-08-17 06:13:16

Climbing Mt. Validation

The other day I was working on a few tweaks on mashby.com and I remembered that it had been quite awhile since I validated my code. For some people as long as it looks good in a browser, that’s good…

 
Trackback by truerwords #
2005-03-07 16:46:08

Creative Commons, Trackback, HTML Comments, and Embedded RDF

How many years do I have to work with HTML before I stop discovering important technical points of which I should have been aware all along? Today’s rather embarrassing example regards the format of HTML comments . This completely took me by surprise. …

 
Trackback by truerwords #
2005-03-18 07:18:57

Phil Responds (sorta) re: HTML Comments and Linking Technologies

I thought Phil was ignoring me. I wrote to him on the 7th, after posting about HTML comments and embedded RDF , to ask what he thought of my suggesting regarding invisible links pointing to autodiscovery documents. He wasn’t ignoring me, he just couldn’t

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.