Appendix C, why hast thou forsaken me?

XHTML 1.0’s Appendix C is the bible for tag soup pretend XHTML (like what I produce: stop bristling, it’s true): it tells you how to produce XHTML which you can serve as text/html to browsers which don’t understand application/xhtml+xml, or, much more commonly, how to pretend that you are using XHTML for some hypothetical future benefit when you are actually just producing HTML with some extra slashes.

However, when I started looking at how far WordPress had strayed from being actual XHTML that could be served as application/xhtml+xml, the very third thing I found looks rather like a deal-breaker: WordPress 2.0 is going to have an editing interface to let you choose custom colors for that blue gradient header background you’ve seen a billion times behind the title of “Foo, Just another WordPress weblog.” The values for the colors are then passed in the query string of a background: url() CSS declaration, and since there’s an upper and a lower, that’s ?upper=33eeee&lower=4180b6.

Unfortunately, Appendix C’s take on that, because the contents of style and script in XHTML are #PCDATA, parsed character data where & is converted to &, while in HTML they are #CDATA, character data where character references are not converted to characters, is

Use external style sheets if your style sheet uses < or & or ]]> or --.

Hixie’s famous Sending XHTML as text/html Considered Harmful offers a less final, but also less palatable alternative:

<style type="text/css"><!--/*--><![CDATA[/*><!--*/
        ...
/*]]>*/--></style>

which I’ve heard doesn’t work correctly in some older versions of Opera, though I don’t know if that’s actually true. Even if it works perfectly in every browser, bleah. That’s awful, a cunning plan at best. Is it also the only alternative to recasting the entire operation so that the URL is hidden away in an external CSS file? That’s not a happy choice, since you can’t count on being able to write to the wp-content directory, but not having the URL with the colors in the query string means having an external CSS file that not only creates an image, but also does a database query. I’m starting to like my hide the whole thing from people doing application/xhtml+xml idea more all the time.

17 Comments

Comment by Justin #
2005-11-28 20:58:51

It seems like the most obvious solution is to use a single GET variable and then parse two variables out of its value to avoid using the amperand at all; e.g., ?range=33eeee-4180b6. One explode('-', $_GET['range']) in the PHP and you’re done.

Comment by Phil Ringnalda #
2005-11-28 21:10:45

Too obvious for me to see it, even: whenever I think of stuffing things together in URLs, all I can see is Firefox’s horrible mistake of using the pipe character to separate multiple home pages, despite it being a legal character in URLs, so I forget that when you control both the generation and the meaning, you can use anything you like.

I wonder if abandoning people who’ve already set colors in betas and release candidates is allowed, particularly since the advanced editing interface currently misparses the existing values and really needs to be fixed anyway.

 
 
Comment by Aristotle Pagaltzis #
2005-11-28 22:02:04

What you should do is reconfigure PHP to follow an 8-year-old W3C recommendation by setting arg_separator.input to &; and arg_separator.output to ; – and badgering the PHPeople to make these the defaults already, for crying out loud. All the server-side libraries and web frameworks in the Perl world have supported this forever.

Then you can just say ?upper=33eeee;lower=4180b6 and get on with your life as a tagsoup chef.

Comment by Phil Ringnalda #
2005-11-29 20:11:30

Thanks for reminding me about that: if nothing else, it makes for a useful hack approach, for people with at least access to .htaccess, and could even make a reasonable core approach: if arg_separator.input includes ; use it, and when people trying to serve application/xhtml+xml without it get tripped up, tell them it’s required.

However, since Matt’s apparently ready to be ready to release, now, there probably isn’t enough time to wait for a PHP release-and-adoption cycle: for even moderately widespread hosting support, I think that’s around 18 months.

Comment by Aristotle Pagaltzis #
2005-11-30 00:37:58

Oh, I didn’t expect that the WordPress release would be held up for that. It just annoys me that PHP still suffers from this problem, for a plainly selfish reason: I can’t avoid ampersands in links to PHP-driven sites, so I’m forced to gunk up links with amp entities. Plus, on a more noble level, it would make it easier for users to do the simplest thing that can possibly work without automatically producing invalid markup.

But that requires PHP users to bitch about it. If WordPress can lend its weight as a prestigious project to the cause, so much the better.

(Excuse the cranky tone in this and the previous comment. I’m not sure just why this minor issue gets my goat so much, but it does.)

Comment by Phil Ringnalda #
2005-11-30 01:30:18

If I didn’t welcome crankiness about esoterica, it would be awfully quiet here. Certainly I wouldn’t be able to say much of anything. You, me, some fresh cranky fish, giving the frustrating world hell is most of why we’re here.

Comment by Aristotle Pagaltzis #
2005-11-30 10:57:48

You are a man after my own heart. :-)

 
 
 
 
 
Comment by Rowan Lewis #
2005-11-28 23:09:35

Just another reason to hate WordPress… honestly, why do people use it?

/me looks at own blog… wait a minute…

Its the best of the worst.

Comment by Phil Ringnalda #
2005-11-29 20:20:18

More a matter of ”meets most people’s needs most of the time,” which is a reasonable place to aim: most WP users will want to pretend to be using XHTML, while very nearly none need or want to actually use XHTML. Judging by a couple of other simple well-formedness errors, nobody who is willing to file bugs or create patches (or push them through, by whatever back-channel means things actually get committed, which I haven’t gotten around to puzzling out yet) has been watching, which simultaneously says that someone needs to pick up that job, and that nobody needs to be chided for not having done it: if there was sufficient demand, someone would already be filing bugs, and eventually verifying in an application/xhtml+xml installation would be a normal pre-commit step. It’s good enough, and I want it to be better than good enough.

Comment by Jacques Distler #
2005-11-30 00:14:26

most WP users will want to pretend to be using XHTML

Most WP users don’t give a rat’s ass what kind of markup it spits out. Nor should they.

while very nearly none need or want to actually use XHTML.

Certainly, not enough to put in the requisite effort.

Maybe if someone came up with a cool application of inline-SVG, a few web-design types would ”re”discover the value of using XHTML.

Comment by Phil Ringnalda #
2005-11-30 01:13:06

Eh, you’re probably right, actually. However: based on obsessively hanging around places where webloggers talk about weblogs and weblogging since March 2001, I can quite safely say that virtually all vocal WordPress users want to produce XHTML and serve it as text/html.

I don’t have any idea how large the contingent of WP users who don’t know or care about teh HTMALs is: they aren’t my people. My people are the ones who post in forums, and send emails to lists, and file bugs, and bore the boring people who thing blogging about blogging is boring, and they’ve all been brainwashed by years of ”XHTML is the future” into thinking that it really is, and that the future is something that will happen soon, not in ten years or more, and they want their extra slashes in their tag soup.

Anne and I would be delighted if WP would produce HTML 4.01, I would be exactly as delighted if it would produce valid and well-formed XHTML, since it’s only the sitting on the fence being neither that I don’t like, but I’d be astonished if pretty much anyone else capable of forming and expressing an opinion wouldn’t prefer the current state to either correct alternative (barring a truly correct implemenation of XHTML, where the only way to affect output is through XML DOM methods, with posts, comments, and templates all required to either correct input or refuse it if it would affect well-formedness, which would make for a very interesting dilemma for people who claim to believe in XHTML without actually wanting to be forced to not make errors).

Comment by Jacques Distler #
2005-11-30 06:24:02

Anne and I would be delighted if WP would produce HTML 4.01,

Note that unencoded ampersands are not valid HTML4 either. Your URL fragment ?upper=33eeee&lower=4180b6 is, according to SGML rules, interpreted as containing the (unknown) entity &lower.

(Actually, in your particular context, the URL occurs in a spot where, in HTML, it is treated as CDATA, no? Unencoded ampersands elsewhere in the interface, however … )

…but I’d be astonished if pretty much anyone else capable of forming and expressing an opinion wouldn’t prefer the current state to either correct alternative

”Prefer” is probably too strong. ”Indifferent” is more accurate, I’d say.

(barring a truly correct implemenation of XHTML, where the only way to affect output is through XML DOM methods, with posts, comments, and templates all required to either correct input or refuse it if it would affect well-formedness, which would make for a very interesting dilemma for people who claim to believe in XHTML without actually wanting to be forced to not make errors).

Or even a half-way correct implementation. For instance, a WordPress plugin which used the W3C Validator Web Service to validate comments (or posts) before accepting them: how popular do you think that would be in the crowd you’re talking about?

Lip service to ”Web Standards” aside, I suspect the answer is: ”not very.”

Comment by Aristotle Pagaltzis #
2005-11-30 10:52:42

I think the right approach to valid comments is to put them through something like TagSoup. You could either be lenient and do this at display time, or be strict and force a preview when a comment required clean-up.

 
 
 
 
 
 
Comment by Kafkaesquí #
2005-11-29 00:32:32

Of course, one could just not use the default theme and so avoid playing around with color gradients in the banner.

Hammer, meet Phillips head screw.

Comment by Phil Ringnalda #
2005-11-29 20:02:01

New WordPress user who blogs about math, needs to use MathML, and thus needs to serve application/xhtml+xml, meet you’re screwed: there is a page in your new weblog software which looks like it would make your weblog look less like everyone else’s, but most of the buttons don’t work, and the one thing that does, ”Advanced,” will require you to learn about wp-content/cache rather quickly, because it will give you a fatal error for every page of your site, weblog and admin, which won’t go away until you learn how to kill the cache file that contains an unescaped ampersand, even if you change the saved values with WordPress’s about:config.

 
 
Comment by Lachlan Hunt #
2005-11-29 07:26:02

The easiest and cleanest way to escape stylesheets within XHTML is to use /*<![CDATA[*/ and /*]]>*/. It works in all browsers and it’s definately not as messy as the one Hixie suggested.

Comment by Phil Ringnalda #
2005-11-29 19:53:50

Hixie’s comments are a little less messy now that I copied and pasted the correct set, for <style> rather than <script>. However, your set works in all browsers where ”all” is the set which includes browsers which know about the <style> element: otherwise, they don’t know that it supports C-style comments, and either treat the contents as a CDATA block (possibly one signalling the end of <head> and opening the <body>), or if they also don’t know from CDATA, treating the whole thing as a horrible mess of PCDATA.

That’s a deal I’m willing to take personally; I haven’t hidden my JavaScript and CSS in SGML comments for years, old browsers be damned. But that doesn’t make it one that WordPress will, or necessarily should, take on other people’s behalf.

 
 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.