What Is a Robots.txt File?

A robots.txt file is a plain text file placed in the root directory of your website that tells web crawlers — including search engine bots and AI training crawlers — which pages they are allowed to access and which they should avoid. When a bot visits your website, it first checks for a robots.txt file at yourdomain.com/robots.txt before crawling any other page. The instructions in this file are the industry-standard way to communicate your crawling preferences to automated systems.

How Robots.txt Works

A robots.txt file uses a simple syntax. The User-agent directive specifies which bot the rules apply to. An asterisk (*) applies the rules to all bots. The Disallow directive specifies paths the bot should not crawl. The Allow directive explicitly permits access to a path that would otherwise be blocked. The Sitemap directive points bots to your XML sitemap. Rules apply on a first-match basis — the first matching rule for a given URL is applied.

Why Block AI Crawlers in 2025

A significant new concern in 2025 is AI training crawlers. Companies building large language models — including OpenAI (GPTBot), Anthropic (ClaudeBot), Common Crawl (CCBot), and others — send automated crawlers to collect web content for AI training datasets. Many website owners and publishers object to their content being used to train AI models without permission or compensation.

Blocking these crawlers with robots.txt is the standard technical method to opt out of AI training data collection. While compliance is voluntary (bots can technically ignore robots.txt), all major AI companies have publicly committed to respecting robots.txt directives. GPTBot can be blocked with User-agent: GPTBot followed by Disallow: /. CCBot (Common Crawl) can be blocked similarly.

Common Robots.txt Rules by Platform

For WordPress sites, the standard robots.txt blocks the WordPress admin area while allowing the AJAX endpoint used by plugins: Disallow /wp-admin/ with Allow /wp-admin/admin-ajax.php. For Shopify stores, the /admin/ and /checkout/ paths should be blocked from public crawlers. For any site, blocking /search/ and /tag/ pages prevents duplicate content issues from being indexed by search engines.

What Robots.txt Cannot Do

Robots.txt controls crawling, not indexing. A page blocked by robots.txt may still appear in search results if other sites link to it — Google can infer a page exists from links even without crawling it. To prevent indexing, use a noindex meta tag or X-Robots-Tag HTTP header. Robots.txt also cannot prevent access to your pages — it only requests that bots comply voluntarily. Malicious crawlers and scrapers typically ignore robots.txt entirely.

How to Use Our Free Robots.txt Generator

Our free robots.txt generator at cookiescursor.com creates a properly formatted robots.txt file instantly. Select your platform preset (WordPress, Shopify, Wix, or Custom), choose which bots to configure rules for, add allow and disallow paths, enter your sitemap URL, and download the ready-to-upload robots.txt file. AI bot blocking (GPTBot, CCBot) is available as a one-click option. No signup required.

Frequently Asked Questions

Where do I upload my robots.txt file?
Upload it to the root directory of your website so it is accessible at yourdomain.com/robots.txt. Most hosting control panels allow direct file upload to the public_html or www directory.

Does robots.txt affect my SEO?
Yes. Accidentally blocking important pages can prevent them from being indexed. Always verify your robots.txt does not block your homepage, key landing pages, or CSS/JS files that Google needs to render your pages correctly.

Can I block a specific directory?
Yes. Disallow: /private/ blocks all URLs starting with /private/.

How quickly does Google respect robots.txt changes?
Google re-crawls robots.txt frequently — usually within a few days. Changes take effect as Googlebot recrawls your site.

Should I block all bots except Google?
Not necessarily. Bing, DuckDuckGo, and other search engines also drive valuable traffic. Only block bots that provide no value — typically AI training crawlers and malicious scrapers.

Does blocking GPTBot affect ChatGPT search results?
Blocking GPTBot prevents OpenAI from crawling your site for training data. It may also affect ChatGPT's ability to cite your content in responses, depending on how OpenAI handles robots.txt for their search features.

Generate Your Robots.txt Now

Use our free robots.txt generator to create a properly configured file for your website. Block AI crawlers with one click. No signup required.