As you learned in Chapter 8, search engines do not want to waste resources indexing multiple versions of the same content–content duplicated on your site or on other sites. If you suspect that multiple pages on your site (or other sites) are similar, you can use these checking tools to identify how close they are to each other. With the Panda updates to the ranking algorithm, Google specifically cracked down on duplicate content. If your pages are not original, they might be removed from the index, or moved into the “supplemental” index, which means they will rarely be seen.
None of the search engines gives us the exact formula for how much similarity is too much, but you can use a general rule of thumb that if your page is more than 50% the same as another, that they are essentially the same page. For more details on duplicate content, review Google’s Duplicate Content help page or Bing’s Webmaster Guidelines for more information.
Here are the tools that check for duplicate content:
|Webconfs Similar Page Checker||The Similar Page Checker will give you a score of how closely the HTML of two pages resemble each other.||Compare two pages||Free|
|Virante Duplicate Content Checker||Looks for multiple versions of a single URL testing www and non-www versions as well as looking for similar pages in search results||Multiple and canonical versions||Free|
|CopyScape||Developed as a plagiarism checker, but works brilliantly as a duplicate content checker.||Multiple pages||Free/Premium $0.5 test|
|Siteliner||Developed as site diagnostic tools, it finds near duplicate pages as well as broken links.||Single site||Free/Premium|