What
A robots.txt file is a text file containing rules about which crawlers may access which parts of a site. You can write and edit it with any basic text editor.
Why
Wye use a robots.txt file. It can do many things like:
Controlling Crawler Access
- You can block entire sections of your website, specific file types like PDFs or archives, or even individual pages if you don't want them to be indexed or accessible to search engines. This can be useful for things like internal search results pages, test pages, or content under development.
- Conversely, you can also specifically direct crawlers to important pages you want them to prioritise for indexing.
Managing Crawl Rate
- If your website receives a lot of traffic, you can use robots.txt to limit the number of requests a specific crawler can make within a certain timeframe. This helps prevent your server from getting overloaded and ensures crawlers don't consume excessive resources.
Improving Search Engine Optimisation (SEO)
- By excluding unnecessary or duplicate content from indexing, you can ensure search engines focus on the most relevant and valuable pages on your website, potentially improving your search ranking.
Preventing Content Scraping
- While not foolproof, robots.txt can discourage automated bots from stealing your content by disallowing access to specific directories or files.
How
You must place the robots.txt file in the top-level directory of a site, on a supported protocol. The URL for the robots.txt file is (like other URLs) case-sensitive. In case of Google Search, the supported protocols are HTTP, HTTPS, and FTP. On HTTP and HTTPS, crawlers fetch the robots.txt file with an HTTP non-conditional GET request.
Using your text editor your write the file like this, this is the robots.txt file on this website:
User-agent: Googlebot
Disallow: /nogooglebot/
User-agent: *
Allow: /
Sitemap: https://forestpathways.co.uk/cms-data/blog/words/words-sitemap.xml
Sitemap: https://forestpathways.co.uk/sitemap/folder/
Sitemap: https://forestpathways.co.uk/sitemap/index/
Note:
- While following robots.txt is common practice for search engines, they won't penalize you for not having one.However, using it effectively can enhance your website's visibility and performance in search results.