Apparently it’s not very clear how to specify the charset that should be used to encode the query for Mozilla/Firefox search plugins (the Sherlock .src files that power the search box next to the addressbar): the documentation on the Mycroft page isn’t actually right (though as usual in programming documentation that allows user comments, it’s corrected in a comment), and not only are a great many of the plugins on Mycroft wrong, so are the default plugins that ship with Firefox and the Application Suite (though at least the Google plugin is wrong in a survivable way).
If you don’t specify a
queryCharset (or don’t correctly specify a
queryEncoding as an integer that maps to a charset), the search is encoded with the user’s default charset, and since some of the default search pages expect ISO-8859-1 and some expect UTF-8, if you search for non-ASCII characters some will fail no matter what.
Full story in the extended entry, to protect the innocent, and those who will never write a search plugin.
It’s not too surprising that so many plugins get it wrong: Apple seems to have written
queryEncoding completely out of their history of Sherlock, so now there’s only a page in the intl project and that comment in the Mycroft docs to explain it.
The charset that will be used to encode the query is determined like this (links will obviously bit-rot pretty quickly):
- The original query string is encoded as UTF-8 (encodeURIComponent())
- nsInternetSearchService.cpp first looks for a queryCharset attribute on the search element in the engine .src file
- If there’s no queryCharset, it then looks for a queryEncoding attribute.
- The original query string gets jerked around through some unescaping and reescaping, and pops out the other end encoded in whichever charset was selected above.
So, no matter what it’s going to find a charset to try to convert with: it may fail, if you use a
queryCharset that won’t work with
textToSubURI->ConvertAndEscape(), or it may be either the user’s default charset or ISO-8859-1 if you don’t specify anything, but it’s going to find something, and if you don’t specify the right thing, it’s going to be a problem.
With the default default charset, ISO-8859-1, search Yahoo! for übel from Firefox’s toolbar, and you’ll search for %FCbel, despite telling Yahoo! you were using UTF-8 with &ei=UTF-8, and since the %FC is meaningless as UTF-8, Yahoo! will just search for bel.
Or, with your default charset changed to UTF-8, search Ask Jeeves in Seamonkey (you ship an Ask Jeeves plugin by default?!) for übel and you’ll search for %C3%BCbel which, since Ask Jeeves uses ISO-8859-1, it will interpret as being a search for Ã¼bel.
The solution is simple enough, the same one the validator insists you use for your own HTML: always specify a charset. Don’t use
queryEncoding since it’s a strange legacy thing for a limited set of charsets, and people copying you won’t know to change your “2336” for EUC-JP to “2561” for Shift-JIS. Just use
queryCharset, and make sure that if you are writing a plugin for something that will take more than one encoding (like Google or Yahoo!) that you also include an input for the variable that says which encoding you used, and use UTF-8 whenever you have a choice, since transcoding Greek to ISO-8859-1 just results in “??????”
For example, a working version of the Yahoo! plugin would include:
<SEARCH version = "7.1" name="Yahoo" description="Yahoo Search" method="GET" action="http://search.yahoo.com/search" queryCharset="UTF-8" > <input name="p" user> <input name="ei" value="UTF-8">
To be done:
- Either morph bug 270120 into a “fix all intl issues with all default Firefox plugins” bug or file a new one.
- File or find a Seamonkey bug for their, er, interesting set of default plugins
- Consider bugging someone about actually putting something up in /products/firefox/plugins/ where the defaults say to look for automatic updates.
- Consider running through all the Aviary l10n plugins looking for trouble, since oddly enough all those intl people didn’t actually test their i18n very well. Decide that it’s SEP.
Sweet, there’s close to twenty tabs in two windows I can close now!