Yahoo!bombers, start your links

Roughly what I expected to happen when I wrote about the Semantic Web example keyword search “how many cows in Texas” has happened: as of this morning, on the data center I’m hitting (the only correct way to speak of search results, flexible as they are), the number one result for the search on Google is my entry. People who object to seeing blogs in their search results aside, it’s not that bad a result: the current answer is right there, along with a link to a page where newer data will appear in the future.

More interesting is the result on Yahoo!’s newly non-Google search: number one as of this minute is my link target, number four is another blog with an entry about the same Washington Post article that inspired me, and number eleven is my blog. At first glance, those sound like good results, but I do have some questions.

They don’t report stop-words (the very common words which aren’t searched, though they still count for positioning: search Google for how many cows in Texas, it will tell you that how and in are common and were ignored, but searching for many cows Texas gives a different result set, because it makes those three words more important to the search than they were before), but the only word of those five which could even remotely be said to occur in the target page is “in”, which is part of the URL for feedback, in /cgi-bin/. Other than that, none of the words appear on the result page, only different forms (cattle) and on pages linked from it (the individual report pages do include cows and Texas). As far as I know, nobody else picked up my cue and linked to my target page with similar keywords. That means that based solely on my one link, Yahoo! is saying that’s the best result. Googlebombing at least required that you have a few friends, and has become more likely to explode in your hand than on your target (that’s why my post is Google’s number one result: if I was telling the truth, their searchers are only two clicks from the answer, if I was trying to call someone a miserable failure, I wind up calling myself one instead). From a casual examination of one particular phrase and link, it looks like so far Yahoo!bombing is going to be incredibly easy.

Then, there’s the weblogs in Yahoo!’s results. Note that I said my entry is Google’s number one result, but that another weblog and my weblog are the number four and eleven results. It’ll take a while, as entries scroll off front pages, but I’m quite curious to see what it means that Yahoo! is apparently more fond of weblog main pages than of archive pages. Some of Google’s troubles, particularly the empty Trackback popups that make Andrew Orlowski foam at the mouth even more than usual, are the result of its attempt to get as close to your final destination as possible. They seem to be trying their best to ensure that you never land on any sort of navigation page: if you search for violet widgets, they want to drop you on /widgets/colors/violet.html, not /widgets/colors/index.html, and if someone stops selling violet widgets, too bad: you’re not going to land on the colors page and see that they do have lavendar widgets. In the case of weblogs, where what would be a main page or site map on other sites is instead a constantly changing main page, I’m quite curious to see what Yahoo! will do, whether you’ll stick in results that you no longer have a main page post about, or just churn your main page around through various results pages as you change topics and posts.

I don’t have any great example searches at hand for demonstrating in general that Yahoo! is returning /widgets/ when Google is returning /widgets/brands/acme-inc/2003/Spring/models/138423/parts/cams.html, but casual searching makes it look like Yahoo!’s more likely to return general pages and sites about your topic (which fits with something that started out as a directory): rss feeds returns many of the same results on both, but on Google Dave Winer’s feed itself is in the top ten, while Yahoo! doesn’t return any actual feeds in their top 40 results, only HTML pages listing available feeds.

Then there’s the fact that Yahoo!’s number one result is their own news RSS feed page. As their search help says,

Note: Web Results include Yahoo! Directory sites, including those submitted via the Yahoo! Express program, and other sites submitted to search engines via other “pay-for-review” or “pay-for-inclusion” programs.

No matter how good Yahoo!’s search technology is, you’re never going to know who got there by being a good result for your search, and who got there by buying their way in, in any one of a number of ways that someone can pay to be included. You can just scroll right past the labelled “Sponsored results” at the top and bottom of the page, but you can’t tell whether something in the actual results is there because it fits your search, or because they bought their way into Yahoo!’s directory, or bought their way into Overture’s results. I’ll be interested to see how their rankings develop under use, but without any indication of how something crawled into the results, they’ll never become my primary search tool.

15 Comments

Comment by Phil Ringnalda #
2004-02-18 11:02:43

That went quick: between the time I started writing and the time I got around to closing the tabs from writing, my target page had disappeared from the results. In-ter-esting.

Comment by Scott Johnson #
2004-02-18 12:15:24

By the time I got around to reading this article, you were ranked 9 on Yahoo and still 1 on Google.

Comment by Phil Ringnalda #
2004-02-18 12:33:37

By the time I got around to reading your comment, my target was number one and I was number 11 on Yahoo!. From this computer, searched by typing the URL search.yahoo.com/search?p=how many cows in texas in the address bar, at 12:28 PST. Wonder if it’s just load-balancing with different results at different data centers, or if their motto is going to be ”Yahoo! Search: bringing variable results to new highs!”.

That, by the way, is one of those things that intrigues us, and annoys the crap out of ordinary human beings, who will quite often make search a regular part of their navigation (I’ve seen people who check their email at Yahoo! by searching for Yahoo in the Google search box that’s on their defaulted browser home page).

 
 
 
Comment by Phil Ringnalda #
2004-02-18 11:13:35

Also in the in-ter-esting files: the usual suspects are already noting that you get an ”Add to MyYahoo!” link for RSS, but not for that evil Atom. Well, search for salad with steve, you get an ”Add to MyYahoo!” for his old index.rdf, which redirects to his Atom feed, which gets added just fine. They also point you to my index.rdf, which has been silently returning my RSS 2 feed for almost a year, to exactly one person’s dismay. The moral? This version crap doesn’t actually matter, unless you need an element from one particular version, and know that you have a consumer who will properly use that element. Mostly, feeds is feeds.

 
Comment by Mark #
2004-02-18 11:33:55

My Yahoo supports Atom. http://jeremy.zawodny.com/blog/archives/001585.html

They may not support Atom autodiscovery yet though.

Comment by Phil Ringnalda #
2004-02-18 11:41:12

Even though lots of people are talking about it as autodiscovery, I don’t think they are using autodiscovery at all: my index.rdf hasn’t been autodiscoverable in forever. My first guess would be that they are working off something like Syndic8’s directory. Wonder if it was an initial load, or if I can affect it by changing what Syndic8 has to say about me?

 
 
Comment by Phil Ringnalda #
2004-02-18 12:39:31

For those Firefox users who’ve forgotten since the last time a new and usable search engine showed up: go to http://search.yahoo.com/search?p=whatever, bookmark it, putting the bookmark in the Quick Searches folder, then Manage Bookmarks, Properties for that bookmark, replace the whatever with %s, type yahoo in the Keyword field, and then yahoo dirty pictures typed in the address bar will take you to what you hope is something other than the same boring results.

Comment by aroon #
2004-02-24 09:50:12

just fyi, for you IE + windows xp users: if you grab the TweakUI windows xp powertoy, you can do the same thing with IE

 
 
Comment by aroon #
2004-02-24 10:01:44

im kind of skeptical of your theory on how yahoo is checking url strings for search words [at least i think thats what you’re saying]. when i search for ’pure imaginary’ at yahoo, the first result is my almost year old blogger.com blog, followed by a gamer who shares my blogs name, and finally, in third place, my current blog site: http://www.pureimaginary.com/pi.

http://www.pureimaginary.com is currently ranked 5th. by your theory [if im understanding correctly], i would expect pureimaginary.com to be ranked first, followed by purimaginary.com/pi, and then who knows what.

also to note, viperstyx.net [my second domain which points to the same page] isnt even listed on the first page of results. if, however, you search for ’pure imaginary aroon’ its ranked second, below the old blogger site and above pureimaginary.com [the commonly used domain].

thoughts?

Comment by Phil Ringnalda #
2004-02-24 14:54:11

My vague memory of my point was that if you have a page for red hats, a page for colors of hats that includes red, and a page for hats, that includes a link to colors and explains that red is one of them, that Google’s a bit more likely to deliver the red hats page, and Yahoo’s a bit more likely to deliver the colors page, or even the hats page. True or not, I’m still not sure.

Your actual case, though, is much easier. Domain name and path keywords are worth something, but they need to be backed up in the page: if famous-celebrity-naked.com doesn’t include any mention of her, only text about weekend pills, it won’t rate for the celebrity in question.

Your Blogspot blog had the target words in the URL, and in an H1 headline. That’s very powerful for those words. Your current blog has the words in the URL, and the words with spaces between each letter (i.e., in search engine terms, not-the-words) in an anonymous, bolded div. That’s likely to make a search engine suspicious of you for those words. Replace that div (or at least the bold tag, depending on what works and what else is in the div and styled by it) with an <h1>, take out the spaces between letters, and then style it to look the same with CSS, and next time they update rankings you ought to see a turnaround (depending, of course, on whether you’ve still got way more links with the blog name in the anchor text pointing to the Blogspot URL than to the new one: that’s hard to overcome with just in-the-page factors).

Comment by Phil Ringnalda #
2004-02-25 15:17:36