Archive for June 2nd, 2008

I apologize for a computer-geek-speak post. Skip it and bear with me, if it leaves you clueless.

When I updated my site a couple of months ago, little did I know that I was making it invisible to Google’s indexer – Googlebot.

One of the improvements that I wrote into the site was that it would identify the prefered language of the browser and select that as the default language.

In the headers passed to the server in the request there isa header called accept-language and it tells the server which language I prefer. My browser passes the value en-us,he;q=0.5 which means I prefer US English, after that I will accept Hebrew and the q=0.5 means that my preference is a 50% one and that I will also accept anything else. From reading the documentation it would appear that if the header said q=1would mean “send me nothing other than these languages”.

Anyway, my code does the following pseudo-code:

If you have a language cookie, then use that value

Otherwise, if the first prefered language (Request.UserLanguages[0]) is “he” then use Hebrew

Otherwise use English

The problem with my code is that the accept-language header is not mandatory and the standard I linked to above states that if no accept-language header is provided then it is assumed that all languages are equal. Googlebot knows this standard and does not provide an accept-language header in its request. This is very logical if you think about it. Google wants to index everything and they don’t what language it is in.

My code was returning an unexplained 500 to Googlebot and got me totally wiped out of search results. What was happening is that when I referenced Request.UserLanguages[0] I was getting an exception because the Request.UserLanguages collection was uninitialized. I have fixed this and put the whole section in a try-catch. I suppose I could have just checked for Request.UserLanguages.length == 0 but I decided to play super safe.

BTW, the way I debugged this is worth a mention as well. If you search for ServerVariables in Google you get a whole lot of pages that demonstrate a dump of the server variables of your request. The version of this page shows the server variables of Googlebot’s request. Thus, by comparing the page in the Google cache with the one I see I could see what the environment differences were between Googlebot’s request and mine. I then used this spoofing tool from “Smart IT consulting” to see when Googlebot could see the page.

Incidentally, before I got to this solution I found a simple “hack” for Mozilla (more a configuration than a hack) that allows you to change the user agent reported by the browser. I tried this and made my browser pretend to be Googlebot. However, this didn’t help because I still had an accept-language header in my request.

Read Full Post »