# What is GPTBot? AI crawler guide

Canonical URL: https://trakkr.ai/bots/gptbot
Published: 2026-06-11
Last updated: 2026-06-11

Learn what GPTBot is, who operates it, its verified user-agent, robots.txt posture, and how blocking it can affect AI search, citations, training, or agent visibility.

OpenAI crawler for content that may be used to improve generative AI foundation models.

## What is GPTBot?

GPTBot is a web crawler operated by OpenAI. It collects publicly accessible text from the internet to help improve generative AI foundation models. The bot identifies itself with the user-agent token GPTBot and respects the Robots Exclusion Protocol. When allowed, it may download page content that OpenAI can use in future model training. Its activity is separate from other OpenAI crawlers, and site owners can control its access through standard robots.txt directives.

## What it's for

If you allow GPTBot, your site's content may be included in the data used to train OpenAI's generative AI models. This could mean your public information helps shape the capabilities of future AI systems. Blocking GPTBot lets you opt out of that training use, but it does not affect whether your site can be cited or surfaced in ChatGPT search features.

## How to handle GPTBot

To prevent GPTBot from crawling your site, add a robots.txt rule that disallows the GPTBot user-agent token. This tells the crawler to skip your pages. The change only affects future crawls and does not apply retroactively to content already collected. If you want to allow crawling, simply omit the rule or explicitly allow the bot.

## robots.txt rule

User-agent: GPTBot
Disallow: /

## Blocking cost

Blocking GPTBot may prevent your content from being used to train OpenAI's generative AI models, but it does not affect your visibility in ChatGPT search citations.

## Examples

- A news website allows GPTBot, and its articles may later influence how an AI model answers questions about current events.
- A personal blog blocks GPTBot in robots.txt, so its posts are not used in future OpenAI model training.
- An e-commerce site disallows GPTBot, ensuring its product descriptions are not included in training data, while still appearing in ChatGPT search results.

## Related bots

- Applebot-Extended: Also tracked as a training crawler.
- Meta-ExternalAgent: Also tracked as a training crawler.
- Bytespider: Also tracked as a training crawler.
- Google-Extended: Also tracked as a training crawler.
- CCBot: Also tracked as a training crawler.
- AI2Bot: Also tracked as a training crawler.
- ClaudeBot: Also tracked as a training crawler.
- img2dataset: Also tracked as a training crawler.
- LAIONDownloader: Also tracked as a training crawler.
- GPTBot: GPTBot is the glossary definition behind this crawler guide.
- AI Training Opt-Out: GPTBot is a training crawler tied to this policy decision.

## Frequently Asked Questions

### What does GPTBot do?

GPTBot is a crawler from OpenAI that collects publicly available web content to help improve generative AI foundation models.

### Does blocking GPTBot affect my site in ChatGPT search?

No. Blocking GPTBot only opts your content out of training use. It does not remove your site from ChatGPT search citations.

### How can I stop GPTBot from crawling my site?

You can block GPTBot by adding a 'Disallow: /' rule for the user-agent GPTBot in your site's robots.txt file.

### Will blocking GPTBot remove my content from past training data?

No. The block only applies to future crawls. It does not affect content that was already collected before the block was in place.

### Does GPTBot follow robots.txt rules?

Yes. GPTBot honors the Robots Exclusion Protocol, so it will respect any disallow rules you set for it in robots.txt.

## Data And Sources

- [OpenAI documentation](https://developers.openai.com/api/docs/bots) - Primary source for GPTBot crawler details.
- [GPTBot live crawler data](https://trakkr.ai/data/crawlers/gptbot) - Trakkr crawler telemetry for this user agent.
