AI Pages Technical Details
Deep dive into how AI Pages works - architecture, caching, and optimization.
- Understand the full request flow from crawler to optimized response
- Learn how caching works and why it's fast
- See exactly what optimizations are applied to each page
- Know the architecture that makes AI Pages reliable
This page explains how AI Pages works under the hood. If you're technical or just curious, read on. If you just want to use AI Pages, you don't need to know any of this.
Architecture overview
AI Pages consists of several components working together:
[AI Crawler] → [Your Platform Integration] → [AI Pages API] → [Optimization Engine]
↓ ↓
[Your Origin] [GCS Cache]Components
| Component | Technology | Purpose |
|---|---|---|
| Platform Integration | JS/TS/PHP/Lua (varies) | Intercepts requests, detects crawlers |
| AI Pages API | Go | Handles auth, routing, caching |
| Optimization Engine | Python | Renders pages, applies AI optimizations |
| Cache | Google Cloud Storage | Stores optimized HTML |
| Analytics | Firestore | Logs all crawler activity |
The platform integration is the lightweight code you deploy on your hosting platform (Cloudflare Worker, Vercel middleware, Netlify edge function, WordPress plugin, etc.). All integrations follow the same pattern and communicate with the same AI Pages API.
Request flow
Here's exactly what happens when an AI crawler visits your site:
1. Request arrives at your platform
GET /products/running-shoes HTTP/1.1
Host: nike.com
User-Agent: GPTBot/1.02. Integration checks user-agent
The AI Pages integration examines the \User-Agent\ header:
const AI_CRAWLERS = [
'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
'ClaudeBot', 'Claude-User', 'Claude-SearchBot',
'PerplexityBot', 'Google-Extended', 'Google-Agent', 'Googlebot-Extended',
'cohere-ai', 'Applebot-Extended', 'Amazonbot',
'Meta-ExternalAgent', 'ByteSpider', 'Baiduspider'
];If human browser: Pass request to your origin server unchanged (adds <10ms latency)
If AI crawler: Forward to the AI Pages API with crawler details
3. AI Pages API receives request
POST /v1/optimize
X-API-Key: prism_xxxxx
{
"url": "https://nike.com/products/running-shoes",
"pathname": "/products/running-shoes",
"crawler": "GPTBot"
}4. Cache check
AI Pages checks if an optimized version exists:
Cache key formula:
SHA256(url + mode + features)Cache hit: Return optimized HTML immediately (~100ms)
Cache miss: Trigger optimization
5. Optimization (cache miss only)
For new pages, the optimization engine:
- 1Renders the page - Uses headless browser to execute JavaScript
- 2Extracts content - Strips scripts, tracking, unnecessary markup
- 3Applies features:
- Structured data injection (JSON-LD) - Key facts extraction - FAQ generation - AI summary block - Entity recognition
- 1Compresses - Removes whitespace, optimizes HTML
- 2Stores in cache - Saves to GCS with 7-day TTL
On cache miss, response is:
{
"status": "optimizing",
"cache": "MISS",
"message": "Serve original, optimization in progress"
}The integration serves your original page while optimization happens in the background. Next crawler visit gets the optimized version.
6. Response served
Cache hit response:
{
"optimizedHTML": "<html>...",
"cache": "HIT",
"is404": false
}Response headers:
X-Prism-Cache: HIT
X-Prism-Response-Time: 87msURL processing
Normalization
URLs are normalized for consistent caching:
- Remove trailing slashes (except root)
- Lowercase the hostname
- Remove fragments (\
#section\) - Keep only essential query params: \
id\, \page\, \category\, \product\, \search\, \q\
Example:
Input: https://Nike.com/Products/?ref=nav&category=shoes#reviews
Output: https://nike.com/products?category=shoesFiltered requests
These are automatically skipped (not optimized, not counted):
File extensions:
.css, .js, .jpg, .jpeg, .png, .gif, .svg, .webp,
.ico, .woff, .woff2, .ttf, .pdf, .zip, .json, .xmlPaths:
/api/*, /ws/*, /graphql, /_next/*, /static/*,
/assets/*, /fonts/*, /images/*, /favicon*Caching strategy
Cache TTL
Optimized pages are cached for 7 days. After expiry, the next crawler visit triggers re-optimization.
Deduplication
Same customer + URL requests within 5 seconds are deduplicated (only counted once). This prevents bots that hit the same page repeatedly from inflating your usage.
Cache invalidation
Currently, caches expire naturally after 7 days. Manual invalidation coming soon.
The five optimization features
1. Pre-rendering
What it does:
- Spins up headless Chrome
- Loads your page with JavaScript
- Waits for content to render
- Captures final DOM state
Result: JavaScript-rendered content becomes static HTML that crawlers can read.
2. Structured data injection
What it does:
- Analyzes page content
- Detects page type (Article, Product, FAQ, etc.)
- Generates JSON-LD schema markup
- Injects into \
<head>\
Example output:
<script type="application/ld+json">
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Nike Air Zoom Pegasus 40",
"brand": {"@type": "Brand", "name": "Nike"},
"offers": {
"@type": "Offer",
"price": "130.00",
"priceCurrency": "USD"
}
}
</script>3. Key facts extraction
What it does:
- Scans content for data points
- Identifies prices, dates, percentages, statistics
- Marks them with semantic HTML
Example:
<span itemtype="price">$130.00</span>
<span itemtype="date">Released March 2024</span>4. FAQ generation
What it does:
- Analyzes content for Q&A patterns
- Generates relevant questions users might ask
- Creates FAQPage schema
Example:
<section itemscope itemtype="https://schema.org/FAQPage">
<div itemscope itemprop="mainEntity" itemtype="https://schema.org/Question">
<h3 itemprop="name">Is the Pegasus 40 good for marathon training?</h3>
<div itemscope itemprop="acceptedAnswer" itemtype="https://schema.org/Answer">
<p itemprop="text">Yes, the Pegasus 40 is suitable for marathon training...</p>
</div>
</div>
</section>5. Entity recognition
What it does:
- Identifies named entities (brands, products, people, locations)
- Marks them semantically
- Creates entity relationships
Example:
<span itemtype="Brand">Nike</span> released the
<span itemtype="Product">Air Zoom Pegasus 40</span> in
<span itemtype="Date">March 2024</span>Performance characteristics
Latency
| Scenario | Latency Added |
|---|---|
| Human visitor | <10ms |
| AI crawler (cache hit) | ~100ms |
| AI crawler (cache miss) | ~2000ms (first time only) |
Reliability
- Automatic failover - On any error, serves your original site unchanged
- Edge-native - Runs at the edge on platforms like Cloudflare, Vercel, and Netlify for minimal latency
- Graceful degradation - If the API is unreachable, your site continues to work normally
Bandwidth
Optimized pages are typically 60-80% smaller than original pages due to removal of:
- JavaScript bundles
- Tracking scripts
- CSS (or minimal inline)
- Non-essential markup
Security
API key protection
- API keys are hashed with SHA256 before storage
- Keys are only transmitted over HTTPS
- You can regenerate keys anytime
Data handling
- AI Pages doesn't store your content long-term
- Cached pages expire after 7 days
- Analytics are tied to your account
- No cross-customer data sharing
SEO safety
AI Pages never serves different content to:
- Googlebot (for search rankings)
- Bingbot
- Any non-AI search crawler
This means your SEO is completely unaffected.
Limitations
What AI Pages can't optimize
- Login-required pages - Crawlers can't authenticate
- Real-time data - Cached content may be up to 7 days old
- Interactive features - JavaScript apps become static snapshots
- User-specific content - Personalization is lost
Known issues
- Very large pages (>1MB) may timeout during optimization
- Complex SPAs may not render completely
- Some anti-bot systems may block our rendering engine
Next steps
Troubleshooting
Fix common AI Pages issues.
API Reference
Integrate AI Pages programmatically.
Was this helpful?
