How to Write Robots.txt Manually

If you have a website, you’ve probably heard of the robots.txt file before. But what exactly is it and why is it important? In this guide, we’ll cover everything you need to know about the robots.txt file, including what it does, why you need one, and how to create one manually for your site.

What is Robots.txt?


The robots.txt file is a text file that goes in the root directory of your website. Its purpose is to give instructions to web robots or crawlers (like the Googlebot) about which areas of your website they should crawl and index, and which areas they should avoid.

Crawlers are automated programs that browse the web to create listings of webpage content for search engines. When a crawler arrives at your site, the first thing it does is check for this robots.txt file to see if there are any rules about what parts of the site it can access.

The robots.txt file allows you to stop web crawlers from accessing certain pages or folders. This can be useful for a few key reasons:

  1. It prevents crawlers from accessing and indexing certain pages that you don’t want shown up in search results, like personal content or private files.
  2. It stops crawlers from wasting resources crawling pages that don’t need to be indexed, like admin pages or duplicate content.
  3. It provides a way to keep sites from being overwhelmed by too many crawlers if needed.

While blocking crawlers keeps content out of search results, it doesn’t actually prevent people from accessing those pages if they have the direct URL. The robots.txt just instructs well-behaved crawlers to avoid those areas.

How to Write Robots.txt Manually 1 - 4
How to Write Robots.txt Manually 4

Why You Need a Robots.txt File


Having a robots.txt file is critically important for any website. Even if you don’t customize it yourself, your server will automatically generate a default one that allows all crawlers to access all areas.

However, you’ll likely want to create a custom robots.txt file for a few key reasons:

Avoid Indexing Sensitive Content


There are certain types of pages and files on your site that you’ll want to prevent crawlers from indexing, like admin login areas, personal user accounts, checkout pages with customer info, and more. The robots.txt allows you to specify rules to block those private sections.

Save Bandwidth and Resources


Another benefit is that you can use your robots.txt to save bandwidth and server resources by preventing crawlers from unnecessarily crawling areas like image directories, PDF files, or archives that don’t provide much SEO value when indexed. This keeps your site running efficiently.

Manage Your Crawl Budget


Most major search engines have “crawl budgets” that limit the maximum number of URLs from a site that their crawlers will visit over a given time period. By using your robots.txt to point crawlers only to the most important areas you want indexed, you can better manage which pages get crawled within that budget.

How to Create a Robots.txt File Manually


Now that you understand the importance of having a customized robots.txt file, let’s go over how to actually create one manually. While many platforms and CMSs allow you to generate this file graphically, knowing how to create it manually is useful.

The first step is to create a new plain text file and name it “robots.txt” (without quotes). It’s important not to add any file extension like .txt to the end.

Next, open the file in a basic text editor. You’ll want to add your rules inside this text file following the proper robots.txt syntax and formatting.

The Syntax


The syntax of the robots.txt file consists of one or more “user-agent” lines specifying which crawler the rules apply to, optionally preceded by a “allow” or “disallow” line stating which directories can and cannot be crawled.

Each rule set for a user agent crawler follows this format:

User-agent: [crawler name]
Disallow: [directory path]
Allow: [directory path]

You can have multiple user agent lines for different crawlers with different allow/disallow rules for each one. There’s also a “catch-all” user agent value of * that applies rules to all crawlers.

Some examples:

User-agent: *
Disallow: /example-directory/

This tells all web crawlers not to crawl the /example-directory/ and any files/pages within it.

User-agent: Googlebot
Disallow: /example-directory/secret/
Allow: /example-directory/public/

This tells just the Googlebot crawler that it cannot crawl anything within the /example-directory/secret/ folder, but it is allowed to crawl anything within the /example-directory/public/ folder.

How to Write Robots.txt Manually 2 - 3
How to Write Robots.txt Manually 5

Creating Your Rules


So how do you decide what rules to include in your robots.txt file? Here are some common use cases for disallow rules:

Block Private Areas


You’ll want to block access to any private or sensitive areas of your site that contain things like user accounts, payment processing, admin areas, etc. For example:

User-agent: *
Disallow: /user-accounts/
Disallow: /admin/
Disallow: /checkout/

Block Files You Don’t Want Indexed


Similarly, there may be certain file types or directories that don’t provide value to have indexed, like image folders, PDF files, zipped archives, and more. For example:

User-agent: *
Disallow: /images/
Disallow: /*.pdf$

Block Duplicate Content


If your site generates duplicate content like filtered archives, print-friendly pages, etc. you can disallow those from being indexed:

User-agent: *
Disallow: /print/
Disallow: /*?

Block Low Value Pages


There may be certain low-quality or low-value pages on your site that you don’t want crawlers wasting resources on, like crawler traps, dummy pages, test areas, etc.

User-agent: *
Disallow: /test/
Disallow: /tr4p/

Block Crawlers on Purpose


In some cases, you may want to temporarily block all or certain crawlers from your site, like when relaunching, doing major site updates, or experiencing excessive loads. For example:

User-agent: *
Disallow: /

This tells all crawlers not to crawl any part of the site at all.

Other Useful Rules

In addition to disallow rules, there are some other handy rules you can include:

Sitemaps


You can include lines to tell crawlers where your XML sitemaps are located:

Sitemap: https://example.com/sitemap.xml

Crawl Delay


You can add a “Crawl-delay:” line to specify how long (in seconds) crawlers should wait between requests to avoid overtaxing your server:

Crawl-delay: 10

Specifying Allowed Crawlers
You can allow only specific listed crawlers and disallow all others by specifying:

User-agent: Googlebot
Allow: /

User-agent: *
Disallow: /

Testing and Submission

Once you’ve created your robots.txt file with your desired rules, be sure to upload it to the root directory of your website (e.g. example.com/robots.txt).

You can then use Google’s robots.txt Tester tool to test it for any syntax issues. Most search engines also have submission processes to specifically submit your new robots.txt for recrawling when making major changes.

In Conclusion

The robots.txt file is a small but powerful part of any website. By creating a customized one with the right rules, you can better control what content gets indexed in search engines, block access to private areas, save server resources, and manage your crawl budget.

While not the most complex file, the robots.txt syntax takes some practice to fully understand. But once you have it set up properly, you’ll be ensuring both search engine crawlers and your human visitors have the best possible experience with your site.

March 14, 2024

0 responses on "How to Write Robots.txt Manually"

Leave a Message

Your email address will not be published. Required fields are marked *

Award winning
Digital Marketing Institute
in India

Seven Boats Academy (A Unit of Seven Boats Info-System Private Limited) is an award-winning digital marketing institute located in Kolkata India, offering digital marketing courses for job seekers, working professionals, and entrepreneurs. This digital marketing institute has set a new benchmark for digital marketing education in India by training over 100K+ students in digital marketing through their online, classroom, corporate training and workshops. Seven Boats provides digital marketing solutions tailored to each student’s requirements while ensuring they acquire the latest industry skills that are required to kickstart their digital career. With their committed team of experienced digital professionals by their side and interactive teaching pedagogy, it’s no wonder Seven Boats has become one of the premier digital marketing institutes in India.

Seven Boats Academy Centres

Nagerbazar

Unit No.304, Diamond Arcade 1/72, Cal, Jessore Rd, Kolkata, West Bengal 700055, India

Phone: 08017049042
Secondary phone: 09674937499
Email: [email protected]

Khardah

P.O, 1095, Lodh House, Arunachal, Khardaha, Rahara, Kolkata, West Bengal 700118, India

Phone: 08017049042
Secondary phone: 09766470193
Email: [email protected]

Bhowanipore

SHIVAYAN, Flat No. G-2, 41B, Suhasini Ganguly Sarani, Bhowanipore, Kolkata, West Bengal 700025, India

Phone: 08017049042
Email: [email protected]

Baranagar

14/4/1, Behari Lal Paul St, Baranagar, Kolkata, West Bengal 700036, India
Phone: 08017049042
Email: [email protected]

top
Copyright © 2011-2024 Seven Boats Academy. All rights reserved.
Login / Sign up
Download Brochure
WhatsApp Chat