phil ringnalda : <Searching where I’m not hated>

<Searching where I’m not hated>

As (Google employee) Nelson Minar notes, sometimes changing URLs by returning a 301 Moved Permanently with the new URL works out fine with search engines: if you search Google for [for fear they’ll lose visibility in search engines], his post at his new URL is the first result.

And as (Google employee) Matt Cutts notes, sometimes it doesn’t work out quite so well. Back at the end of October, when I switched to WordPress, I moved my weblog to the subdomain “weblog.” and changed to WordPress’s default URL scheme, with the words from the post title separated by hyphens (because Underscores are bad (for search engines)) and without the “.php” extension (because, among other things, it makes search results for [foo php] horribly cluttered with things only published using PHP, not about PHP), and redirected all my old URLs to the new ones. Now when you search for a set of words I’ve used in a post, like [when Gmail first added Atom feeds], you will not find my post.

If you want to find me in your Google search results, you need to do one of two things: remember that the parameter to include “duplicate results” is &filter=0 and add that to your query URL, or work your way to the end of the results (in this case, that would be around result 747, despite their claim to have 412,000 results) and click the link to include omitted results. Then, once again work your way out to the end of the results (now at result 976, despite the complete and utter lie of “412,000 results”), and you’ll find me: the last damn thing they think anyone would ever want, with my useless front page, my worthless permalink, my crap-filled paged archive (now blocked off in robots.txt, because there’s really no value a search engine can extract from that view), and my spammy HTMLized version of the Atom autodiscovery RFC (which at least serves the purpose of increasing my spam load).

The other, and faster, way to find me in Google results, searching for the words as a phrase, might hold a partial clue to why I’ve become persona non grata in Google’s view: [“when Gmail first added Atom feeds”] returns one result, from Sameer D’Costa’s aggregator. Include duplicates on that search, and you’ll also see a couple more Gregarius installs, along with my own worthless results. You won’t see my Gregarius install, though, because I know search engines hate duplicate content, so in Gregarius’s Admin/Config I set rss.config.robotsmeta to “noindex, follow” so that every page gets a <meta name="robots" content="noindex,follow" /> to tell search engines not to index my aggregator as duplicate content, though they are welcome to discover any URLs they don’t already know about. I don’t have any way of knowing whether Google first decided I was garbage, and then decided that made me a duplicate of other people aggregating me, or decided I was garbage because I was a duplicate of people duplicating me, but I know it isn’t helping: the only time you should let search engines index the output of an aggregator is when that’s the only place the feeds appear as HTML. Otherwise, it’s duplicate content, and that’s going to wind up hurting someone, whether it’s you or the source.

Now, maybe it’s just me, but I really don’t like getting fucked by someone who spends the whole time punching me and swearing at me and insulting me. If that’s the way Google feels about me, I’d rather just end our relationship. That takes two things. First, I went to about:config and changed keyword.URL to http://myweb2.search.yahoo.com/search?ei=UTF-8&p= so typing search terms in the addressbar will take me to Yahoo! rather than doing a Google “I’m feeling lucky” search, changed browser.search.defaultenginename to Yahoo so that selecting text and choosing Search Web for from the context menu would search at Yahoo, and changed browser.search.order.1 to Yahoo so Google wouldn’t be on top. I should probably delete the google.src plugin file, too, but for now I’m not quite willing to burn every last bridge. (And I didn’t switch to MSN Search because they make Google look like the kindest and gentlest of lovers: the only way you’ll find me in any results there is to include weblog.philringnalda.com in your search terms. Otherwise, I simply don’t exist.)

Then there’s the other half of breaking up: taking away Googlebot’s and msnbot’s keys to my house. Despite the fact that Google just flat out hates me, they’ve requested 6226 pages from me already this month (msnbot has only requested 1836, but they’ve requested robots.txt 104 times, to Googlebot’s 31 — maybe they know what’s coming).

That ain’t free: to keep my host from pushing me into a dedicated server, I had to install a caching plugin, and then reconfigure it to an even longer cache time to prevent misses when they came back for more, just for search engines, which are the only things that ever look at most old weblog posts (well, along with confused and misled searchers, if you happen to be returned as something other than the very last result after duplicate results are included). But, block Googlebot? That seems pretty serious, somehow, even though most of the “visitors” they deliver are just comment spammers (one of the clues that I’d become invisible in search engines was just how little comment spam I’m getting).

So, just like every abused woman on a crappy talk show that you want to shake some sense into, I’ll give Googlebot and his buddy msnbot another month, hoping he’ll change his abusive ways. After all, maybe one of his friends is reading this, and can talk some sense into him.

This entry was posted on Sunday, January 8th, 2006 at 4:09 pm and is filed under blogging tech, carping, meta, wordpress. You can follow any responses to this entry through the post feed. You can leave a response, or trackback from your own site.

23 Comments

Comment by Sameer D'Costa #

2006-01-08 18:05:59

Oh, I thought I was just helping you out by sending less traffic to your blog. :)

Anyway, I shall change my robots meta in Gregarius to ”noindex, nofollow”. Maybe we should change the default value in the next version… I wonder…

Reply to this comment

Comment by Phil Ringnalda #

2006-01-08 20:19:51