Google-Blogger synergy? Not much.

Back when Google purchased Blogger, there was a lot of talk about what Blogger could do for Google, and what Google could do better with and for Blogger. One of the most frequently mentioned things was indexing: either Google would selectively index Blogger-powered blogs faster and better, or they would not be evil, and would index all blogs faster and better.

Well, doesn’t much look like it, at least for Blogger-powered blogs on Blog*Spot. Go to any random Blog*Spot address and look for the robots.txt file, that tells search engine crawlers what they can and can’t access. My FAQ blog will do. Hit refresh a few times. As of this moment, what you should see is one of two things, for any given refresh, either:

User-agent: msnbot
Disallow: /


User-agent: msnbot
Disallow: /
User-agent: googlebot
Disallow: /

Completely banning Google’s crawler from any page on a given Blog*Spot blog, sometimes, randomly, maybe a bit less that half the time, doesn’t exactly strike me as good synergy. Does seem like it would explain why I’ve heard people complaining that their Google-powered site search for their Blog*Spot blog isn’t working right anymore, though. If half the time Googlebot gets told to stay away, you’re not going to be very well indexed. Maybe Yahoo! has a similar site-search form?

Oh, and the banning of msnbot? That’s just rude. Funny, but rude.


Comment by Mark A. Hershberger #
2004-02-27 18:55:19

* sites are all served from the same IP, but that IP is probably being served by multiple hosts. Looks like they aren’t all configured identically.

A pretty innocent sysadmin error.

Still, ya gotta wonder why they want(ed) to block googlebot.

As for banning msnbot:

Comment by Phil Ringnalda #
2004-02-27 19:33:06

Not sure I buy that as an explanation for banning msnbot. After all, you can’t ban it for ignoring your robots.txt when the only thing your robots.txt has to say is that msnbot is banned.

I’d be willing to believe that msnbot was hitting them harder than they could handle, except that they’re supposed to be on Google’s infinitely-scalable servers now, and surely they would have thought about how it would look to have msnbot banned from searching Google’s blogs.

Never ascribe to malice, yes, but for something that would look this bad?

Comment by Carl Garland #
2004-02-27 19:20:29

One possibility is since all the content lies within the Google realm they may just be smarter about their allocation of resources. It would be trivial to index their own blogs themselves and only when needed/updated. That would prevent the extra load on their servers and bandwidth following links from external pages.

Comment by Phil Ringnalda #
2004-02-27 19:28:37

Sure, they could do it internally. But then why ban themselves through robots.txt, some of the time, and not other times? Do they sometimes forget that they aren’t supposed to be doing it in the normal crawl? And, don’t forget the people who find that their Google-powered site search only partly works, because only some of their archive pages are in Google at all, and others have that ”URL but no text” look that says Google knows about them, but wasn’t able to crawl them.

Comment by Mark #
2004-02-27 19:51:36

I smell a bug. Randomly fluctuating robots.txt files can’t possibly be deliberate.

Comment by Vishi #
2004-02-28 16:41:49

Googles business model lies in making sense of of data on the web and organizing it for the user. It lies in Googles interest to make sure reading websites and organizing them is difficult unless you have access to comples Google algos. In short google hates RSS and other Semantic web initiatives. This is the reason Google bought Blogger and and is trying to break RSS interoperability and is supporting Atom.

A nice corporate theory. :)

Trackback by 4 Banalitaten #
2005-03-12 11:22:01

Sì�, ma io io sono fico…

Scenario. Microsoft apre un servizio di hosting per blog. Prende il file robots.txt di tutti gli utenti e ci scrive dentro che  vietato l'accesso allo spider di Google. Quali sarebbero le reazioni?Cambio di scenario. Google compra un ser…


Sorry, the comment form is closed at this time.