# What is cohere-training-data-crawler? AI crawler guide

Canonical URL: https://trakkr.ai/bots/cohere-training-data-crawler
Published: 2026-06-11
Last updated: 2026-06-11

Learn what cohere-training-data-crawler is, who operates it, its verified user-agent, robots.txt posture, and how blocking it can affect AI search, citations, training, or agent visibility.

Cohere training-data crawler token reported for downloading web data for enterprise language models.

## What is cohere-training-data-crawler?

cohere-training-data-crawler is a web crawler operated by Cohere that downloads publicly available web data to train enterprise language models. It identifies itself with the user-agent token cohere-training-data-crawler. The crawler's purpose is to gather text from the open web for use in training Cohere's AI systems. Its behavior and compliance with robots.txt are unverified, meaning there is no confirmed documentation on how it respects standard exclusion rules. Site owners can use the token in their robots.txt file to signal an opt-out from this specific crawling activity.

## What it's for

For a site owner, this crawler represents Cohere's effort to collect training data for its language models. If you do not want your content used in this way, you can block the crawler. Allowing it may contribute your site's public text to Cohere's training datasets, which could influence how their models perform on topics related to your content.

## How to handle cohere-training-data-crawler

To prevent cohere-training-data-crawler from accessing your site, add a robots.txt rule targeting its user-agent token. The page will display the exact snippet you need. Because the crawler's robots.txt posture is unverified, there is no guarantee it will honor the rule, but it is the standard method to express your opt-out preference.

## robots.txt rule

User-agent: cohere-training-data-crawler
Disallow: /

## Blocking cost

Blocking cohere-training-data-crawler may prevent your content from being included in Cohere's training data, which could reduce your site's indirect influence on their language models.

## Examples

- A news website adds a robots.txt rule to disallow cohere-training-data-crawler, aiming to keep its articles out of Cohere's training corpus.
- A technical blog allows the crawler, hoping its tutorials will improve Cohere's model performance on programming topics.
- An e-commerce site blocks the crawler to avoid having its product descriptions used in training without explicit permission.

## Related bots

- PanguBot: Also tracked as a training crawler.
- DeepSeekBot: Also tracked as a training crawler.
- Meta-ExternalAgent: Also tracked as a training crawler.
- TerraCotta: Also tracked as a training crawler.
- img2dataset: Also tracked as a training crawler.
- Applebot-Extended: Also tracked as a training crawler.
- Google-Extended: Also tracked as a training crawler.
- GPTBot: Also tracked as a training crawler.
- GrokBot: Also tracked as a training crawler.
- AI Training Opt-Out: cohere-training-data-crawler is a training crawler tied to this policy decision.
- Cohere: cohere-training-data-crawler connects this operator term to its crawler behavior.

## Frequently Asked Questions

### What does cohere-training-data-crawler do?

It crawls websites to download publicly available text for training Cohere's enterprise language models.

### How can I stop cohere-training-data-crawler from crawling my site?

You can add a robots.txt rule that disallows the user-agent token cohere-training-data-crawler. The page provides the exact snippet.

### Does cohere-training-data-crawler obey robots.txt?

Its compliance is unverified. There is no official documentation confirming that it respects robots.txt rules, so blocking it may not be effective.

### What happens if I allow cohere-training-data-crawler?

Your site's public text may be downloaded and used to train Cohere's language models, potentially affecting how those models generate content related to your domain.

### Is cohere-training-data-crawler associated with any other bots?

Based on available information, there are no related bots explicitly linked to this crawler.

## Data And Sources

- [cohere-training-data-crawler source reference](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/table-of-bot-metrics.md) - Source used to verify cohere-training-data-crawler.
