In Chapter 3, when describing how Search Engines work, we mentioned language filters and, briefly, how search engines detect languages.   Search engines use a few methods to detect a searcher’s language preferences:

  • Search Query Language – This is the default. If they can deduce a language from the search query, then they will return results that contain that phrase  in the same language.
  • Browser Language Preference – Most browsers default to the language version that the user downloads.  Users can also set a language preference within the browser.  The site will try to match its content to the user’s preference.  If the user sets the language preference of their browser to Japanese search engines, when possible, search engines will return results in Japanese.
  • Content Language – Most search engines do a good job of detecting languages. Google states they can detect 38 different languages. The challenges occur for languages such as English or Spanish, because many countries speak those languages.
  • Content Language Meta Tag – The popular Search Engines have indicated that they do take this tag into consideration but since it is not typically used or accurate it is the least reliable of the three approaches.  Note that as of this writing it is the primary factor used by Bing to distinguish local language content.
  • HTML Language Tag – Most search engines have publicly stated that they do not use the HTML Lang attribute (<html lang=”fr”>) but Bing does consider this tag.   On tests done by the authors, over 90 percent of websites have incorrectly set this tag on at least some of the pages, so be forewarned.

Setting your Page Level Language Preferences

The best way to set the language of a specific page is to use the HREFLang attribute.   This attribute is currently accepted by Google and Yandex.  It can be implemented two ways, either as lines of code in each page or in the case of Google, also as a HREF XML Site map.   This also allows you to set the country that the page is related to.

You do this by adding references to the alternative versions of each page in the <head> of the page.  For example,

<link rel=”alternate” hreflang=”x-default” href=”http://www.example.com/en/”/>
<link rel=”alternate” hreflang=”en-us” href=”http://www.example.com/us/”/>
<link rel=”alternate” hreflang=”en-gb” href=”http://www.example.com/uk/”/>
<link rel=”alternate” hreflang=”pt-BR” href=”http://www.example.com/br/”/>
<link rel=”alternate” hreflang=”de” href=”http://www.example.com/de/”/>
<link rel=”alternate” hreflang=”ru” href=”http://ru.example.com”/>

The page itself must be referenced and you must use refer to the language by using the ISO 639-1 structure.

HREF XML Site Maps

Currently, Google is the only search engine that has enabled HREF XML Site Maps. We prefer this method because it takes the weight off the HTML page as well as being far easier to update. For specifics on Google HREF XML Site Maps, please consult their site.  If you are interested in a tool that can help you develop these site maps, Back Azimuth offers a beta at HREF XML Builder, so go there and sign up.

<?xml version=”1.0″ encoding=”UTF-8″?>
<urlset xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9″
xmlns:xhtml=”http://www.w3.org/1999/xhtml”>

<url>
<loc>http://www.example.com/english/</loc>
<xhtml:link rel=”alternate” hreflang=”de” href=”http://www.example.com/deutsch/”/>
<xhtml:link rel=”alternate” hreflang=”de-ch” href=”http://www.example.com/schweiz-deutsch/”/>
<xhtml:link rel=”alternate” hreflang=”en” href=”http://www.example.com/english/”/>
</url>

<url>
<loc>http://www.example.com/deutsch/</loc>
<xhtml:link rel=”alternate” hreflang=”en” href=”http://www.example.com/english/”/>
<xhtml:link rel=”alternate”hreflang=”de-ch” href=”http://www.example.com/schweiz-deutsch/”/>
<xhtml:link rel=”alternate” hreflang=”de” href=”http://www.example.com/deutsch/”/>
</url>
</urlset>

Google recently added the ability to monitor the language settings for accuracy via Webmaster Tools.

href_langage_monitoring