# What is Diffbot? AI crawler guide

Canonical URL: https://trakkr.ai/bots/diffbot
Published: 2026-06-11
Last updated: 2026-06-11

Learn what Diffbot is, who operates it, its verified user-agent, robots.txt posture, and how blocking it can affect AI search, citations, training, or agent visibility.

Diffbot crawler for extracting structured web data and maintaining its knowledge graph.

## What is Diffbot?

Diffbot is a web crawler operated by Diffbot that extracts structured data from public pages to build and maintain a knowledge graph. It identifies itself with the user-agent token Diffbot and respects the Robots Exclusion Protocol. The crawler processes content to turn unstructured web information into machine-readable facts, which then feed into Diffbot's data products. Site owners may see requests from this bot as it indexes and analyzes pages for entity extraction, relationship mapping, and data structuring. Its activity is not tied to traditional search indexing but rather to powering downstream applications that rely on organized web data.

## What it's for

When Diffbot crawls your site, the structured data it extracts can become part of a commercial knowledge graph. This graph may be licensed to businesses, researchers, and AI systems, meaning your public content could indirectly influence data-driven products, analytics, and machine learning models without appearing in a conventional search engine.

## How to handle Diffbot

To prevent Diffbot from accessing your site, add a robots.txt rule that disallows the Diffbot user-agent. Because Diffbot honors robots.txt, this will stop its crawler from processing your pages. If you want to allow crawling, no action is needed, but you can also use granular allow and disallow directives to control which parts of your site are accessible.

## robots.txt rule

User-agent: Diffbot
Disallow: /

## Blocking cost

Blocking Diffbot may prevent your content from being included in its knowledge graph, which could reduce your visibility in AI-powered applications, data products, or analytical tools that rely on Diffbot's structured data.

## Examples

- A news publisher's article is crawled by Diffbot, and the extracted entities and facts are added to the knowledge graph, later surfacing in a business intelligence tool that licenses Diffbot data.
- An e-commerce product page is processed by Diffbot, and the structured product attributes become part of a dataset used to train a recommendation engine.
- A company's about page is analyzed by Diffbot, and the organizational relationships are mapped into the knowledge graph, potentially appearing in a corporate data product.

## Related bots

- Amazonbot: Also tracked as a crawler crawler.
- OAI-AdsBot: Also tracked as a crawler crawler.
- GoogleOther: Also tracked as a crawler crawler.
- GoogleOther-Image: Also tracked as a crawler crawler.
- GoogleOther-Video: Also tracked as a crawler crawler.
- ImagesiftBot: Also tracked as a crawler crawler.
- omgili: Also tracked as a crawler crawler.
- Google-Firebase: Also tracked as a crawler crawler.
- bedrockbot: Also tracked as a crawler crawler.
- Robots.txt: Robots.txt is the control file used to allow or block Diffbot.
- Crawling: Diffbot is a concrete crawler example for this concept.

## Frequently Asked Questions

### Does Diffbot follow robots.txt rules?

Yes, Diffbot honors the Robots Exclusion Protocol. If you disallow the Diffbot user-agent in your robots.txt file, it will not crawl your site.

### What kind of data does Diffbot extract?

Diffbot extracts structured data from web pages, such as entities, facts, and relationships, to build and maintain its knowledge graph.

### Will blocking Diffbot affect my search engine rankings?

Blocking Diffbot does not directly impact traditional search engine rankings, but it may limit your content's inclusion in data products and AI systems that use Diffbot's knowledge graph.

### How can I allow Diffbot to crawl only part of my site?

You can use robots.txt directives to allow or disallow specific paths for the Diffbot user-agent, giving you granular control over what it accesses.

### Is Diffbot associated with any search engine?

Diffbot is not a search engine crawler. It focuses on extracting structured data for its knowledge graph, which is used in various data products and AI applications.

## Data And Sources

- [Diffbot source reference](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/table-of-bot-metrics.md) - Source used to verify Diffbot.
- [Diffbot live crawler data](https://trakkr.ai/data/crawlers/diffbot) - Trakkr crawler telemetry for this user agent.
