If there’s one thing that the RSS Draconian Wars taught us, it’s that you don’t want to be involved in any discussion of XML and error handling.
No, that’s not it. It’s, err, that any recovery from error by a parser, as distinct from an application which employs a parser, is evil.
PHP, meet your evil patch. Even better, if I understand Daniel Veillard’s comment correctly, PHP’s only going to be doing it because it’s already there in the widely (and rightly) respected libxml2 library.
Me? I don’t care. No matter how badly people misunderstand it, what the XML spec says is that a conforming XML parser must halt and catch fire when it hits a fatal error. It doesn’t say that an application which handed the purported XML to the parser can’t massage it and try to run it through again (in fact, it makes a strong effort to make clear the difference between a parser and an application that employs it). Does it matter to me whether the massaging code lives in the same library as the parser? ‘Fraid not. In fact, who’s more likely to get the massaging right, libxml2, or me? You try to parse purported XML as XML, and if it doesn’t work, you decide on a case-by-case basis whether it’s better to catch fire, or just set the bozo bit and get what you can. Kudos to Daniel for doing it once, in one place, as well as he can, rather than having it done poorly in tens of thousands of places.