If you have a website, you’ve probably heard of the robots.txt file before. But what exactly is it and why is it important? In this guide, we’ll cover everything you need to know about the robots.txt file, including what it does, why you need one, and how to create one manually for your site.

Page Contents

What is Robots.txt?

The robots.txt file is a text file that goes in the root directory of your website. Its purpose is to give instructions to web robots or crawlers (like the Googlebot) about which areas of your website they should crawl and index, and which areas they should avoid.

Crawlers are automated programs that browse the web to create listings of webpage content for search engines. When a crawler arrives at your site, the first thing it does is check for this robots.txt file to see if there are any rules about what parts of the site it can access.

The robots.txt file allows you to stop web crawlers from accessing certain pages or folders. This can be useful for a few key reasons:

It prevents crawlers from accessing and indexing certain pages that you don’t want shown up in search results, like personal content or private files.
It stops crawlers from wasting resources crawling pages that don’t need to be indexed, like admin pages or duplicate content.
It provides a way to keep sites from being overwhelmed by too many crawlers if needed.

While blocking crawlers keeps content out of search results, it doesn’t actually prevent people from accessing those pages if they have the direct URL. The robots.txt just instructs well-behaved crawlers to avoid those areas.

How to Write Robots.txt Manually 1 - 4 — How to Write Robots.txt Manually 4

Why You Need a Robots.txt File

Having a robots.txt file is critically important for any website. Even if you don’t customize it yourself, your server will automatically generate a default one that allows all crawlers to access all areas.

However, you’ll likely want to create a custom robots.txt file for a few key reasons:

Avoid Indexing Sensitive Content

There are certain types of pages and files on your site that you’ll want to prevent crawlers from indexing, like admin login areas, personal user accounts, checkout pages with customer info, and more. The robots.txt allows you to specify rules to block those private sections.

Save Bandwidth and Resources

Another benefit is that you can use your robots.txt to save bandwidth and server resources by preventing crawlers from unnecessarily crawling areas like image directories, PDF files, or archives that don’t provide much SEO value when indexed. This keeps your site running efficiently.

Manage Your Crawl Budget

Most major search engines have “crawl budgets” that limit the maximum number of URLs from a site that their crawlers will visit over a given time period. By using your robots.txt to point crawlers only to the most important areas you want indexed, you can better manage which pages get crawled within that budget.

How to Create a Robots.txt File Manually

Now that you understand the importance of having a customized robots.txt file, let’s go over how to actually create one manually. While many platforms and CMSs allow you to generate this file graphically, knowing how to create it manually is useful.

The first step is to create a new plain text file and name it “robots.txt” (without quotes). It’s important not to add any file extension like .txt to the end.

Next, open the file in a basic text editor. You’ll want to add your rules inside this text file following the proper robots.txt syntax and formatting.

The Syntax

The syntax of the robots.txt file consists of one or more “user-agent” lines specifying which crawler the rules apply to, optionally preceded by a “allow” or “disallow” line stating which directories can and cannot be crawled.

Each rule set for a user agent crawler follows this format:

User-agent: [crawler name]
Disallow: [directory path]
Allow: [directory path]

You can have multiple user agent lines for different crawlers with different allow/disallow rules for each one. There’s also a “catch-all” user agent value of * that applies rules to all crawlers.

Some examples:

User-agent: *
Disallow: /example-directory/

This tells all web crawlers not to crawl the /example-directory/ and any files/pages within it.

User-agent: Googlebot
Disallow: /example-directory/secret/
Allow: /example-directory/public/

This tells just the Googlebot crawler that it cannot crawl anything within the /example-directory/secret/ folder, but it is allowed to crawl anything within the /example-directory/public/ folder.

How to Write Robots.txt Manually 2 - 3 — How to Write Robots.txt Manually 5

Creating Your Rules

So how do you decide what rules to include in your robots.txt file? Here are some common use cases for disallow rules:

Block Private Areas

You’ll want to block access to any private or sensitive areas of your site that contain things like user accounts, payment processing, admin areas, etc. For example:

User-agent: *
Disallow: /user-accounts/
Disallow: /admin/
Disallow: /checkout/

Block Files You Don’t Want Indexed

Similarly, there may be certain file types or directories that don’t provide value to have indexed, like image folders, PDF files, zipped archives, and more. For example:

User-agent: *
Disallow: /images/
Disallow: /*.pdf$

Block Duplicate Content

If your site generates duplicate content like filtered archives, print-friendly pages, etc. you can disallow those from being indexed:

User-agent: *
Disallow: /print/
Disallow: /*?

Block Low Value Pages

There may be certain low-quality or low-value pages on your site that you don’t want crawlers wasting resources on, like crawler traps, dummy pages, test areas, etc.

User-agent: *
Disallow: /test/
Disallow: /tr4p/

Block Crawlers on Purpose

In some cases, you may want to temporarily block all or certain crawlers from your site, like when relaunching, doing major site updates, or experiencing excessive loads. For example:

User-agent: *
Disallow: /

This tells all crawlers not to crawl any part of the site at all.

Other Useful Rules

In addition to disallow rules, there are some other handy rules you can include:

Sitemaps

You can include lines to tell crawlers where your XML sitemaps are located:

Sitemap: https://example.com/sitemap.xml

Crawl Delay

You can add a “Crawl-delay:” line to specify how long (in seconds) crawlers should wait between requests to avoid overtaxing your server:

Crawl-delay: 10

Specifying Allowed Crawlers
You can allow only specific listed crawlers and disallow all others by specifying:

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

Testing and Submission

Once you’ve created your robots.txt file with your desired rules, be sure to upload it to the root directory of your website (e.g. example.com/robots.txt).

You can then use Google’s robots.txt Tester tool to test it for any syntax issues. Most search engines also have submission processes to specifically submit your new robots.txt for recrawling when making major changes.

In Conclusion

The robots.txt file is a small but powerful part of any website. By creating a customized one with the right rules, you can better control what content gets indexed in search engines, block access to private areas, save server resources, and manage your crawl budget.

While not the most complex file, the robots.txt syntax takes some practice to fully understand. But once you have it set up properly, you’ll be ensuring both search engine crawlers and your human visitors have the best possible experience with your site.

March 14, 2024

Posts | Courses

Deepan Paul

Deepan Paul is a Senior Digital Marketing Executive at Seven Boats. With a strong focus on digital marketing, Deepan has achieved notable recognition and awards for his expertise, including 3 LinkedIn Top Voice Awards in Search Engine Optimization, Organic Search, and Web Content Writing. He is also a member of the LinkedIn News India Partner Program and has had his articles featured by LinkedIn News India. Additionally, Deepan serves as a trainer for Seven Boats Academy, where he imparts his knowledge and skills to others. As an alumnus of Seven Boats, Deepan has successfully managed over 90 projects, including international ones, and has a proven track record of driving traffic, generating leads, and boosting sales for businesses of all sizes.

Website : https://www.linkedin.com/in/deepan-paul/

How to Write Robots.txt Manually

What is Robots.txt?

Why You Need a Robots.txt File

Avoid Indexing Sensitive Content

Save Bandwidth and Resources

Manage Your Crawl Budget

How to Create a Robots.txt File Manually

The Syntax

Creating Your Rules

Block Private Areas

Block Files You Don’t Want Indexed

Block Duplicate Content

Block Low Value Pages

Block Crawlers on Purpose

Other Useful Rules

Sitemaps

Crawl Delay

In Conclusion

Deepan Paul

0 responses on "How to Write Robots.txt Manually"

Leave a Message Cancel reply

Want to learn Digital Marketing?

Digital Marketing Tools

Award winning
Digital Marketing Institute
in India

How to Write Robots.txt Manually

What is Robots.txt?

Why You Need a Robots.txt File

Avoid Indexing Sensitive Content

Save Bandwidth and Resources

Manage Your Crawl Budget

How to Create a Robots.txt File Manually

The Syntax

Creating Your Rules

Block Private Areas

Block Files You Don’t Want Indexed

Block Duplicate Content

Block Low Value Pages

Block Crawlers on Purpose

Other Useful Rules

Sitemaps

Crawl Delay

In Conclusion

Deepan Paul

0 responses on "How to Write Robots.txt Manually"

Leave a Message Cancel reply

Want to learn Digital Marketing?

Digital Marketing Tools

<img width="752" height="752" src="https://cdn.7boats.com/academy/wp-content/uploads/2023/01/Seven-Boats-Academy-Logo-Square.png" alt="Seven Boats Academy" />