# What is Scrapy? AI crawler guide

Canonical URL: https://trakkr.ai/bots/scrapy
Published: 2026-06-11
Last updated: 2026-06-11

Learn what Scrapy is, who operates it, its verified user-agent, robots.txt posture, and how blocking it can affect AI search, citations, training, or agent visibility.

Scrapy framework user-agent commonly used for web scraping, including AI and machine learning data extraction.

## What is Scrapy?

Scrapy is an open-source web crawling framework maintained by Zyte. When a crawler built with Scrapy makes a request, it can identify itself with the user-agent token 'Scrapy'. This token is not tied to a single commercial service; it is the default identifier for any deployment of the framework, whether used by researchers, businesses, or hobbyists. Because Scrapy is widely adopted for general web scraping, its user-agent often appears in logs from automated data collection, including projects that gather training data for AI and machine learning models.

## What it's for

For a site owner, traffic from the 'Scrapy' user-agent typically indicates that someone is using the Scrapy framework to extract data from your site. This could be for legitimate purposes like academic research or price monitoring, but it may also represent unwanted scraping that consumes server resources. Blocking this user-agent can stop many generic scraper deployments, though it may also affect benign or customer-configured crawls that rely on the framework.

## How to handle Scrapy

You can manage access for Scrapy-based crawlers by adding a rule in your robots.txt file. The page already shows the standard snippet to disallow all paths for the 'Scrapy' user-agent. If you prefer a more selective approach, you can allow specific directories while disallowing others. Keep in mind that not all Scrapy crawlers respect robots.txt, so you may need additional server-side measures if you observe non-compliant behavior.

## robots.txt rule

User-agent: Scrapy
Disallow: /

## Blocking cost

Blocking the 'Scrapy' user-agent may prevent your site's content from being included in datasets used for AI training, search, or citation, but it could also stop legitimate research or business intelligence gathering that might benefit your visibility.

## Examples

- A university researcher uses Scrapy to collect publicly available data for a study on web trends, and the crawler identifies itself with the 'Scrapy' user-agent.
- A startup configures a Scrapy spider to monitor competitor pricing, and the requests appear in server logs under the 'Scrapy' token.
- An AI company deploys a Scrapy-based crawler to gather text data for training a language model, using the default user-agent without customization.

## Related bots

- FirecrawlAgent: Also tracked as a crawler crawler.
- GoogleOther: Also tracked as a crawler crawler.
- Google-CloudVertexBot: Also tracked as a crawler crawler.
- Amazonbot: Also tracked as a crawler crawler.
- bedrockbot: Also tracked as a crawler crawler.
- GoogleOther-Image: Also tracked as a crawler crawler.
- GoogleOther-Video: Also tracked as a crawler crawler.
- Panscient: Also tracked as a crawler crawler.
- Google-Firebase: Also tracked as a crawler crawler.
- Robots.txt: Robots.txt is the control file used to allow or block Scrapy.
- AI Crawlers: Scrapy is a concrete crawler example for this concept.

## Frequently Asked Questions

### Who operates the Scrapy crawler?

Scrapy is not a single crawler operated by one entity. It is an open-source framework maintained by Zyte, and anyone can use it to build their own crawlers. The 'Scrapy' user-agent is the default identifier for these crawlers.

### Does Scrapy always respect robots.txt?

Scrapy includes built-in support for robots.txt, but it is up to the developer to enable or disable this feature. Therefore, some Scrapy-based crawlers may ignore your robots.txt rules.

### Can I block Scrapy without affecting other bots?

Yes, you can target the 'Scrapy' user-agent specifically in your robots.txt file. This will only affect crawlers that identify themselves with that exact token, leaving other bots unaffected.

### Is Scrapy used for AI data collection?

Scrapy can be used for AI and machine learning data extraction, as it is a general-purpose scraping framework. However, not all Scrapy crawlers are gathering AI training data; many are used for other purposes like research or business intelligence.

### What should I do if a Scrapy crawler ignores my robots.txt?

If you notice a Scrapy-based crawler disregarding your robots.txt rules, you may need to implement additional access controls, such as rate limiting, IP blocking, or serving a CAPTCHA, to protect your site's resources.

## Data And Sources

- [Zyte documentation](https://scrapy.org/) - Primary source for Scrapy crawler details.
- [Scrapy source reference](https://github.com/ai-robots-txt/ai.robots.txt/blob/main/table-of-bot-metrics.md) - Source used to verify Scrapy.
