Underscores are bad, mmkay?

Whether or not you should care how well Google indexes your weblog is an interesting question, but one which would take more time to cover than my current attention span. However, if you do want to rank as high as possible, I can tell you one thing you should not do: use underscores to separate words in the path and filename. I’ve seen him say this before, but I always seem to lose track of the quote (so it’s a good thing I have a blog, to store things like this): according to GoogleGuy, the Google employee who posts at WebmasterWorld:

If you use an underscore ‘_’ character, then Google will combine the two words on either side into one word. So bla.com/kw1_kw2.html wouldn’t show up by itself for kw1 or kw2. You’d have to search for kw1_kw2 as a query term to bring up that page.

Translating from the “no real words” language of WebmasterWorld, that means that /underscores-are-bad-mmkay.php would match a search for underscores, or for bad, or for mmkay, or in the case of a blog that uses the title in the URL and in the page, would reinforce the importance of each of those words, while the filename this entry will get, underscores_are_bad_mmkay.php, only matches a search for underscores_are_bad_mmkay.

Personally, I’ve never been very interested in search engine visitors, and I lost most of my interest in search engine results when I dropped out of first place for Phil, but if you blog in a way that makes you want to be highly rated in Google, and you are planning a major URL change anyway, there’s no sense throwing away ranking by using underscores, when hyphens are nearly as pretty, and give you separate words in Google’s eyes.

(Via the unfortunately quashed GoogleGuy Says Weblog

21 Comments

Comment by Jonas Galvez #
2004-04-22 22:03:18

Now, can someone explain me this? I’ve noticed that if the words on the url are glued together, then Google will actually do a ”*keyword*”-type search on it, making these words appear bold on the URL when displaying a search result. I haven’t seen this before, so I’m little confused. Does making your blog-file names a ”single word” increase your rank?

Here’s a screenshot:
http://jonasgalvez.com/unsorted/google_keywords.jpg

Comment by Phil Ringnalda #
2004-04-22 22:33:54

The only explanation that makes sense to me (based on five minutes of testing, to be sure) is that the highlighting code works very differently than the searching code: search for rington and you don’t get pages that include ringtone, but search for ringtone and they’ll highlight it in the URL ringtoneparty. I’d guess that once the search results code has delivered a particular page for a keyword or set of keywords, then the highlighting code will ignore punctuation, and whitespace, and lack of whitespace, and highlight the keywords wherever the right letters appear in a row.

Comment by Jonas Galvez #
2004-04-23 01:09:35

Yeah, that seems to be the most plausible explanation…

 
 
 
Comment by Matt #
2004-04-22 23:37:12

Completely agreed, though personally I think hypens are more aesthetically pleasing than underscores.

 
Comment by Luke Hutteman #
2004-04-23 07:16:13

Hey, at least you got beaten by some well-known Phils in Google (Dr., Punxsutawney and Collins) – I, on the other hand, got outranked by a porn site, which puts a whole different spin on ”I’m Feeling Lucky”.

btw, I do think google ranking is imporant, as it helps bring new visitors to your site, as opposed to us boring old-timers ;-)

Comment by Phil Ringnalda #
2004-04-23 07:35:20

Well, that’s certainly visionary and philosophical of you ;)

Comment by Luke Hutteman #
2004-04-23 21:23:09

You forgot ”highly gifted” :-) but then again, so do most of my google visitors to that particular post – the emode test seems to be much more popular than the high iq one (or just has a more googleable title I guess)

You were certainly spot on in identifying that particular post though – it gets about all its traffic from google visitors.

 
Comment by mini-d #
2004-04-27 06:29:27

I use underscore and I’ve found them ok, also, i reinforce the use of good titles using the same words as the file name.

Cheers.

Comment by Phil Ringnalda #
2004-04-27 07:33:42

Since you have the same words in your post title and in your filename, you don’t actually know whether or not you have a problem with underscores.

However, search for allinurl:spring ping thing and you’ll get Matt’s recent post, search for allinurl:brilliant forensic work and you will not get my recent post (though if you search for allinurl:brilliant_forensic_work you will, so you can see that it is indexed).

 
 
 
 
Comment by RaphaŽl Balimann #
2004-04-23 17:02:37

Hi. Thx for the advice. I hope you’ll switch too. This is how i did it: Weblog Conf, Archiving, use <MTEntryTitle dirifyplus="pld"> instead of the dirify="1" for the individual archive entry. Then go to http://mt-stuff.fanworks.net/plugin/dirifyplus.phtml and install that dirifyplus plugin. Rebuild and you’re done. Did that on my site, seems to work.Do you have a clue how a regular expression in the .htaccess should look like if i’d want to replace underscores to hyphens and redirect the browser to those the new files? I found something like this but it’s unfinished as you can see…RewriteEngine onRewriteBase /blog/RewriteRule ^(.*)$ $1 [R]

btw, you could remove the .php file endings and add this: DefaultType application/x-httpd-php (it would make your links more permanent, as you could replace the default type to text/html if you’d want to.)

Comment by Jason Mevius #
2004-09-22 11:55:20

I’ve tried something like this as well, but am falling short. I would like to redirect something like this:

/tester/archives/2004/09/10/mt_31_installation

to:

/tester/archives/2004/09/10/mt-31-installation

I’m trying to combine my switch to dynamically generated pages with my switch to hyphenated filenames. Any ideas?

 
 
Comment by Keith #
2004-04-24 01:35:06

Do you know how Google views periods? Sometimes I use periods in my URLs similar to how people use hyphens or underscores (I got the idea from CNN – here’s an example currently on their front page). Here’s an example currently on my front page, and here’s a Google search in which that and another entry with a similar URL appear.

The highlighter only works on a word if it isn’t followed by a period, as ”interview” is in that Google search, though as you pointed out the highlighter may work differently than the search code. Do you know what the search code does in this case?

Comment by Phil Ringnalda #
2004-04-26 12:58:27

Don’t know. My first guess would be that it doesn’t treat periods as word separators, but we would need a test case where the filename ”words” didn’t appear in the body at all to really tell: weblogs are too optimized, with the filename keywords also in the post title, and often also in the body text. If /canine.comestibles.html showed up even though you only used dog food in the body, then you would really know.

 
 
Comment by Jonathon Delacour #
2004-04-24 03:58:09

So, here’s what I don’t understand — given that the intent of using underscores to separate words in a filename should be blindingly obvious, why does Google choose to ignore them?

 
Comment by Mark Carey #
2004-04-24 08:14:45

Jonathon, you bring up a very good point. GoogleGuy has finally conclusively said that underscores don’t count as word separators, but why? People have been using them that way for along time (the Yahoo! and DMOZ directories for example). I would be willing to bet that undercores are more commonly used than hyphens in URLs, so you have to wonder why? Perhaps it is somehow related to the fact that you can’t use underscores in domain names…

 
Comment by Adam #
2004-05-19 00:28:58

I think the reason why is that underscores are not ’user-friendly.’ Every person — even a non-geek — knows where the ’dash’ key is. And in traditionally-underlined hyperlinked text, underscores can look too much like spaces.

That’s why I think that Google is trying to discourage their use.

 
2006-01-08 16:10:10

[…] And as (Google employee) Matt Cutts notes, sometimes it doesn’t work out quite so well. Back at the end of October, when I switched to WordPress, I moved my weblog to the subdomain “weblog.” and changed to WordPress’s default URL scheme, with the words from the post title separated by hyphens (because Underscores are bad (for search engines)) and without the “.php” extension (because, among other things, it makes search results for [foo php] horribly cluttered with things only published using PHP, not about PHP), and redirected all my old URLs to the new ones. Now when you search for a set of words I’ve used in a post, like [when Gmail first added Atom feeds], you will not find my post. […]

 
Comment by Jon Dowland #
2006-04-20 05:13:49

Hmm thanks for the coverage. This is interesting. I draw parallels with how many regular expression engines consider the underscore to be a ”word character” whereas the dash/minus is not.

 
Comment by Frankie Roberto #
2007-02-24 05:09:23

Google now recommends underscores:

”We recommend that you replace the spaces in your links with an underscore.”
http://www.google.com/support/analytics/bin/answer.py?answer=27231

Comment by Joey B #
2007-04-16 06:21:28

Frankie said:

Google now recommends underscores

It appears that Google probably does not recommend underscores. This is most likely a disconnect between the Analytics team and the SE team. Please see

 
 
Comment by Joey B #
2007-04-16 06:22:38
 
Name (required)
E-mail (required - never shown publicly)
URI
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <del datetime="" cite=""> <dd> <dl> <dt> <em> <i> <ins datetime="" cite=""> <kbd> <li> <ol> <p> <pre> <q cite=""> <samp> <strong> <sub> <sup> <ul> in your comment.