Used properly, the robots.txt file is the main way your website tells the search spider where to crawl and where to stay away. Unfortunately, it isn’t always coded correctly, providing one of the biggest reasons that sites are blocked from search engine spiders. The site Robotstxt.org is the definitive source for understanding the variations of the robots.txt, but an automated tool can catch errors you might miss.
For the ultimate test, it is strongly suggested you use the tools provided by the search engines:
- Google Robots.txt Checker: Dedicated checker but you need to be logged into your Webmaster Tools account.
- Fetch as BingBot – Not a dedicated checker but is checked as part of a “Fetch as BingBot” session.
- Yandex Robots.txt Checker: A standalone tool that allows you to check for any spider blocks and correct formatting.
There are also free tools that can help you test your robots.txt file to ensure spiders can access the key parts of your site:
- SEO Book Robots Checker – This is a great free tool to do a quick check of your robots.txt file
- Motoricerca Robots.txt Checker – This is another quick and free checking tool
- SEO Chat SEO Validator – Free robots validation tool
For a master list of crawlers to exclude, you can review the list maintained by the Robots.txt Organization.
Google also requires that you do not block their access to your .css and JavaScript folders if you have a responsive media site. In order for them to understand the responsive functionality they need to be able to access these folders on your site. Many developers have blocked those areas of websites in the past, so make sure that yours is correctly allowing access.