phil ringnalda : What Google could do with weblogs

What Google could do with weblogs

While I don’t know what Google will do about weblogs in search results (which puts me ahead of Andrew “I see the Googlebots walking among us” Orlowski, since I at least know what I don’t know), I do know one thing that they could do.

If you remember back to our last flurry of talking about Google and weblogs, when the purchase of Blogger was announced, lots of people were talking about Google using the changes.xml file from weblogs.com as a way to find out what weblogs they ought to index immediately. Anyone whose job depends on having Google index them well would have fallen off their chair laughing at the thought of Google letting websites choose to be instantly indexed: weblogs.com would have been completely overwhelmed as every single commercial site on the entire web began to ping it, and ping it, and ping it. Being indexed quickly by Google is valuable. Until quite recently, practically speaking you needed to allow two months for new content to get into Google’s index. But, despite the fact that a self-selected list of things to crawl wouldn’t work for Google, Blogger recently released its own changes.xml file.

Even though Andrew “I Can’t Remember How Search Engines Work, Because I Hate Blogs So Much” Orlowski doesn’t think so, weblogs are useful to search engines. Google News is useful for fresh content about the boring things that big media likes to cover, but people use Google for more than just the latest Britney Spears gossip. If someone creates a “Flog Britney Spears” Flash game, and someone who deleted their forty times forwarded email wants to find it, they won’t be pleased to know that Google will add it to their index during the next monthly crawl. They probably don’t want to read three hundred weblog posts about how fun it is to flog Britney, they want to flog Britney now! Although the actual weblog posts themselves may not be as useful to Joe Searcher as their current ranking in Google results makes it seem, the links within them, especially in aggregate, are. If three hundred blogs link to the same page, two hundred of them including “Flash game” in the link text, then that page might be a good candidate for a jump in the “Flash game” results, at least for a while (though that sort of thing reopens the whole Googlebombing issue). At the very least, that page needs to get in the index, pronto, if it isn’t already there.

Google can’t just remove every weblog post from the main index without reducing the quality of their search results: there are things that only appear in weblogs, or where weblogs are the best results. However, there’s no need to give a weblog’s front page prime results, when it’s just a temporary view of the posts that permanently appear elsewhere. By treating any URL that appears in a changes.xml file as the front page of a weblog, Google could more intelligently handle weblogs, returning any other page from the same site with the same keywords instead of returning the front page, so that they would no longer deliver frustrated searchers looking for that post that was on your front page two weeks ago.

With a little cunning to determine what’s a part of your weblog, and what’s an unrelated part of your site, they could also damp down the extremely high rank they give to things like Movable Type comment and TrackBack popups, which are so well marked up semantically that The Register’s Chief Foamer-At-The-Mouth is actually right (for all the wrong reasons) when he fingers TrackBack (and comments, though he doesn’t mention them because Joi doesn’t use comment popups, so one doesn’t come up first when he ego surfs for andrew orlowski googlewash) as being a source of weblog noise in search results. It really has nothing to do with TrackBack the spec, and everything to do with good markup. Google loves small pages with good HTML, so if your TrackBack listing popup has your entry title as the HTML <title>, and especially if it then repeats it in a <h2> in the body, then it’s going to rank high for keywords in the title, and if you bury the title for your actual entry, either by having date-based archives so there isn’t a single page with the entry title as the HTML title, or, as Joi does, by not using semantic markup to let Google know that the entry title is an important part of the entry, then your TrackBack popup may well outrank your entry itself. Knowing that something is a weblog, based on it having pinged, would let Google damp down that sort of thing with a weblog-specific filter, without having to completely ghettoize us on a tab that nobody would ever search (I pass through Google twenty or thirty times a day, but I would have been hard pressed to name all five tabs, because I use Google to search everything for me, not to have me tell it how to search and where to look).

I don’t know in detail how to use weblog links to keep Google fresh while avoiding Googlebombing, or how to keep beautifully marked up but meaningless pages from outranking more confused but richer pages, but then, that’s why I don’t work at Google. Judging by the more than a thousand posts in various threads in the WebmasterWorld Google forum about Google updates in just the last week or so, I’d say the folks at Google haven’t quite run out of ideas for how to change their index around.

Preposting update: while looking around at just how many posts there really were over there, I ran across GoogleGuy (an actual Google employee) saying rather diplomatically that Orlowski’s full of shit. Ev saying it was one thing, but they might not tell him everything about the search side of the business, just yet. GG, other than his Googlish way of saying as little as possible, has always seemed quite authoritative. And as an aside, what’s up with Ev’s archives, with the link in the RSS feed pointing to a weekly archive page that includes his original “Orlowski is full of crap. Again.” post, while the front page and its permalink to to a monthly archive with the post snipped? Also, I noticed that the template for the weekly archive was much easier to read, and the blogroll is, um, er, very nice company to keep. I’d roll that version of the template back onto the main page, if it was me (calculating how much an Evhead link is worth in Blogshares money, thinking about a stock split).

This entry was posted on Monday, May 12th, 2003 at 12:32 am and is filed under blogging tech. You can follow any responses to this entry through the post feed. You can skip to the end and leave a response. Pinging is currently not allowed.

22 Comments

Comment by Anders #

2003-05-12 00:45:13

A while back I was looking at the Google WebQuotes:
Anders Jacobsen’s Blog: How Google uses weblogs to enhance search results today: Google WebQuotes

When I heard about the Google announcement, I thought they were now taking WebQuotes out of the Labs and on to the ”production” site, but that’s only my guess…