Google does have a problem with blogs

While chewing on Jeremy for his latest attack on Google and PageRank, it occurred to me that Google does actually have a problem with weblogs. If it doesn’t have anything useful to return for a search (whether because the searcher used really poor search technique, or because there hasn’t been anything around long enough to index), returning what seems likely to be a good weblog that might have some relevance is a good idea. Since Jeremy dinged me for personifying Google, I should probably say instead that increasing the ranking of pages that are updated frequently, have a high link to text ratio, and are highly linked, is a useful strategy when there aren’t other good result candidates. It works out the same, either way: if you don’t have clearly good results, then among the not-so-good results a weblog is a good bet, since it ought to be linking to useful things, and maybe will have “related” links to other entries on the same subject, using different keywords.

The problem is that Google can only tell “good weblog” and “bad weblog” (yeah, I know, you aren’t really bad, just underappreciated so far, but Google has a pretty simplistic view of the world). Google can’t tell “good weblog about blogging, bad weblog about TV commercials for Canadian iced-tea-flavored malt liquor products”, so since Mike’s Hard Lemonade didn’t do nearly as good a job hiring a web designer as they did hiring people to do the second head commercial, when Google went looking for things to return for mike’s hard iced tea second head, they settled for me, hoping that I would at least link to the otherwise-hidden source of information. Nope, not my department. I’m more than willing to write about and link to things that I think are interesting about weblogs and blogging, but when I start talking about pop culture and current events, I’m just blathering uselessly.

The solution that I mentioned in Jeremy’s comments, scrapping blogrolls in favor of links to the same people’s posts, including titles or useful keywords about the post in the link text, doesn’t actually work. If you have two hundred people linking to your main page, Google’s going to think it must be something good, but if you have two hundred people linking to fifty or a hundred different posts, the effect of the links ends up getting diffused to the point where you only look sort of fair, not good. Categorizing people by topic (Simon CSS PHP, Stuart Javascript Python) is not only likely to be wildly inaccurate (I’ve found myself in people’s “Web design” category, fergossake), but from what I’ve seen from people just broadly categorizing sections of their blogroll, is quite likely to produce the sort of angry flames that end up with the whole blogroll in the trash.

So, I see the problem now: I think you write good stuff about some subjects, and total crap about others, and you think the same about me, but neither one of us knows how to tell Google. What I don’t see is a solution.

19 Comments

Comment by milbertus #
2003-08-07 04:39:50

I still don’t like the rumored solution of pushing blogs into their own category on Google. There are times, as you said, that the best hit is a blog, and if they’re in their own category, someone could potentially miss the link that they’re looking for.

But there is definitely a problem, though. A few months back, I wrote about how Nintendo is going to make a sequel to their latest Zelda game, and a search for new zelda game recently had me as the 3rd hit (behind Nintendo’s main site for Zelda). Only recently did I get demoted to 6th.

Comment by Phil Ringnalda #
2003-08-07 09:07:01

You linked to that article in a post on ”new zelda game” because you thought it was interesting and important, right? Unfortunately, computerandvideogames.com’s bizarre URL structure makes them virtually unlinkable for Google’s purposes, because using ”id” in a query string gives Googlebot the willies, thinking it has wandered into something with session ids that it will never escape. So Google said to itself (I’ve really got to stop that) ”best I can do is let milbertus direct people on to it: I can’t look myself for fear of being turned to stone, but I trust him to have picked a good link”.

 
 
Comment by Danny #
2003-08-07 06:21:17

I guess first you need a way of expressing this – something like Advogato perhaps, maybe someone’s already done it in FOAF.
Anyhow, forget Google, other small tools are much more likely to implement something useful along these lines first. I suspect it’s only a matter of time before Google starts taking note of explicit metadata.

Comment by Phil Ringnalda #
2003-08-07 08:27:11

Well, they already do: they take note of the explicit metadata of page titles and headings, which is a big part of the reason MT-produced weblogs with individual entry archives rank so high for so many searches. As to explicit metadata that doesn’t also appear in the page… <meta name=”keywords” value=”none whatsoever”> Maybe, maybe, you could use other pages’ metadata about a page, though that seems likely to just produce metadatafarms to replace linkfarms.

 
 
Comment by kodi #
2003-08-07 06:54:50

Your (abandoned) idea about abandoning blogrolls would actually exacerbate the problem, wouldn’t it? If everybody says Phil Ringnalda has a problem with Google, and neglects to mention that they’re a fan of Phil Ringnalda, suddenly you’re a better authority on Google than you are on yourself.

Comment by Phil Ringnalda #
2003-08-07 07:58:45

Well, if there are twenty links to Phil Ringnalda: Google does have a problem with blogs, and thirty links to Phil Ringnalda on blogging on Sunday, and if I do a good enough job of distributing PageRank internally, then I ought to end up as an authority on ”google problem blogs” and ”blogging Sunday” at those entries, and on myself in general, especially at the main page.

Probably what I want is a sort of a version of what Radio was going to be at one point in its evolution: they had run My.Userland, a server-side aggregator, and Radio was going to be My Userland On The Desktop, reading your feeds and letting you choose which items to republish in your own feed of selected news (a way of using it that a few people still do follow). When I hit a new item that’s just purely Mark, the sort of thing that’s the reason I read him, I’d like an easy in-aggregator way of seeing which three of his posts I’m currently linking in my linkroll, and switching in the new one for one of the existing ones.

 
 
Comment by Mark #
2003-08-07 07:08:50

re: ”maybe someone’s already done it in FOAF.”

Some programmers, when faced with a problem, think ”I know, I’ll add RDF.” Now they have two problems.

(With apologies to Jamie Zawinski)

 
Comment by anand #
2003-08-07 23:59:45

Posted as a comment in jeremey’s blog,

Google is no longer governed solely by pagerank.

To understand how google works you need to understand the google crawling patterns.

There was a time when there was a single google bot which crawled the net once every month and adjusted the SERPs accordingly.

Now there is an additional google bot called the freshbot which mines the web every three days or so.

So now google adjusts its SERPs not only by relevancy but also by freshness of content.

Now a weblog is undoubtedly the freshest of sites and so google tends to favour the weblog when it is looking for freshness. Your high pagerank helps you to come out first among these fresh stories.

With time, your story will fall while entire sites devoted to arnold’s election campaign will come into the SERPS via the monthly crawl.

 
Comment by David #
2003-08-08 00:31:21

I think you’ve got it almost exactly the wrong way round :)

The problem isn’t that Google doesn’t know enough about weblogs, it’s that it doesn’t know enough about the rest of the world. If you take Jeremy’s weblog, not only can Google deduce that he’s an expert on (well, at least writes a lot about) MySQL, but it can also deduce that he doesn’t regularly write about Arnie; because a high proportion of people who read his site also link to him, it could even find out that people who read his site are interested in mySql, but don’t talk about showbiz governors.

The millions of people who access the NYTimes via their bookmarks are totally invisible to Google, while the 100s(?) who access zawodny.com and also have a weblog are very visible.

With that in mind, there is *nothing* that bloggers can do to help Google (except for deliberately hiding info from Google, which is a bit silly), but there are a few things Google can do:

* Hook up with (e.g.) AOL and steal all the access stats about AOL users (which sites they visit). Almost certainly an invasion of privacy, but it’ll tell them who visits what, and allow them to raise the pagerank of popular sites.
* Develop a ’linkrank’. If pagerank is based on how many incoming links a site has, then be a bit more clever about how important each of those links are. (If it’s in a list of links, low importance, if it’s in the body of text, high importance).

 
Comment by Gerald #
2003-08-12 04:18:49

Nice discussion, but is it necessary. Searching for ”jeremy” shows the jz-blog at pos 1 in google, searching for ”phil” shows the pr-blog at the top. Your blogs are designed as weapons, taking advantage of googles algorithms that are based on pagerank and linkage. You manipulate the search results. Therefore the results are just what I expected. And as this topic has been discussed by some more blogs (using the appropriate keyword links) the result won’t change for the next days, weeks and month.

Now I switch over to Jeremies blog and post this comment there too ;-)

 
Comment by Joe p #
2003-08-15 11:43:43

Google is no longer governed solely by pagerank.

To understand how google works you need to understand the google crawling patterns.

Nuff Said!

Comment by Phil Ringnalda #
2003-08-15 14:36:25

Well, no, enough said is when you’ve explained the importance of in-page factors and link text, the apparent changes through the last few updates and the expectations if not the reality of full-time Freshbot and rolling updates, and then actually looked at the page in question, particularly with a link: search for incoming links (there aren’t any) and explained that Jeremy just happened to get caught by Googlebot with that entry in his ”recent entries” list, and quite likely in the ”related from Google” list on some entries as well, during the deep crawl for the latest update, and as a result every single one of his entries was spamming Google with a link to that entry when people started searching for Arnie4Gov info.

Had you explained all that, I might not have noticed that you were a comment spammer, since I didn’t actually notice that you were just quoting a previous comment with the moronic and illiterate ”Nuff Said!” tacked on. And what the hell’s the point of comment spamming with link text like ”Joe p” – why would someone selling kitchen appliances want to be highly ranked for that search?

 
 
Comment by Anonymous #
2003-09-27 20:50:03

Not sure if it got in last time but I really cant see G having a solution to blog spam as it doesnt generally have a definitive sig or does it? I think I will write an article on this, after all guestbooks still work

Comment by Phil Ringnalda #
2003-09-27 21:07:42

How utterly charming! Comment spam masquerading as an intention to write an article! By the way, your URL is now deleted and blocked.

 
 
Comment by Bryant #
2005-04-10 08:20:48

Much much later: Google starts getting smarter about questions, at least if they’re posed as questions. So, while ”what is the population of X?” is a query likely to return my blog somewhere in the top few results, the very top of the results section will have a real answer.

 
Trackback by Population: One #
2003-08-07 05:09:00

Sure, I know that

Jeremy Zawodny makes the observation that I’ve made a couple of times; namely, Google’s PageRank is broken. (OK, he’s said…

 
Trackback by Third Superpower #
2003-08-07 07:23:04

Battle of the Accents

Jeremy is upset that he is the Number One for Schwarzenegger for Governor on Google. I am amused that I…

 
Trackback by Jeremy Zawodny's blog #
2003-08-07 08:05:48

PageRank: still broken

A friend just pinged me on messenger to let me know that if you search Google for ”Schwarzenegger for governor” you get an entry form my blog as the first result. (See screen shot on the right and click for full image.) I wonder how long that will las…

 
2003-08-08 12:58:36

Google issues

Time to throw a few more quarters in or kick Google’s PR machine …

 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.