The robots.txt file is one of the simplest ways to control the way in which search engines crawl your website. The first thing a search engine spider will do when it hits your website is load the robots.txt file. This tells the spider which pages it can and cannot access or index. If you have areas on your website which you want kept out of the search engines, make sure you include them in the robots.txt file. However, bear in mind that the robots.txt file is technically visible to all users so if you’ve got very private stuff listed there, put it behind a username and password barrier too!
Even if you don’t want to block any of the search engine spiders from accessing or indexing any of your content, you should ensure that you have a robots.txt file on your server which tells spiders that they are free to access all pages. This simple code will do the trick:
User-agent: *
Disallow:
With this code, all robots will be able to view all files. If you want to ban all robots from all files, use the following code:
User-agent: *
Disallow: /
If you want to simply ban all robots from a certain couple of files, use:
User-agent: *
Disallow: /file1.html
Disallow: /file2.html
If you want to ban a particular crawler, spider or robot from files, use the following syntax:
User-agent: Googlebot
Disallow: /yourfile.html
The robots.txt file is fairly flexible and the syntax is very easy to comprehend and write, even for a beginner. We always make sure we make good use of the robots.txt file for every project we complete in web design Cardiff. Hopefully our tips will have helped you to form your own robots.txt file and finally exert a small piece of control over the search engines! If you require any additional help with website design in Wales or search engine optimisation Cardiff, please feel free to give us a call for a no obligation chat.