The Guide to Understanding Robots.txt and Protecting Your Website

What is robots.txt and how does it work?


A robots.txt file is a text file that tells web crawlers (also known as bots or spiders) which pages on your website they can and cannot access. It is a way for website owners to control how their site is indexed by search engines.

It is a plain text file that follows the Robots Exclusion Protocol (REP). The REP is a set of rules that define how web crawlers should behave when crawling a website.

The robots.txt file contains two types of directives: allow and disallow. The allow directive tells web crawlers that they are allowed to crawl a specific page or directory. The disallow directive tells web crawlers that they are not allowed to crawl a specific page or directory.

How to identify the bots causing the issue


  1. To find the bots causing the issue you will need to access the "Awstats" tool within cPanel.



  2. Select "view" on the domain:  (Note: If your domain is using https then you will want to view the option that has (SSL) at the end)

  3. Scroll down until you see the section that shows "Robots/Spiders visitors (Top 25)" ; these are the bots you want to add to the robots.txt file

How to protect your website


  1. Access the "File Manager" tool within cPanel:



  2. Go to the public_html of the domain in question:



  3. Select "File" on the top left and create a new file within public_html called "robots.txt":



  4. Once the file has been created you want to edit the file:

       5. The configuration can be done in numerous ways. The main inputs you want to have is the "User-agent:" field that lets you specify which bot to remove. The other is the "Disallow" which indicates what folder you do not want them to access. For our sake we will put in "User-agent: *" to indicate all bots and "Disallow:/" so it is every directory. 

Note: If you do not want to code the robots.txt yourself, there are many tools online that create the text file for you with different options such as: https://en.ryte.com/free-tools/robots-txt-generator/#disallowing


Additional information


For additional resources please visit the official documentation for the tool on Google's website: https://developers.google.com/search/docs/crawling-indexing/robots/intro