AI Pages Technical Details

Deep dive into how AI Pages works - architecture, caching, and optimization.

8 min readUpdated Jan 11, 2026

What you'll learn

Understand the full request flow from crawler to optimized response
Learn how caching works and why it's fast
See exactly what optimizations are applied to each page
Know the architecture that makes AI Pages reliable

This page explains how AI Pages works under the hood. If you're technical or just curious, read on. If you just want to use AI Pages, you don't need to know any of this.

Architecture overview

AI Pages consists of several components working together:

Text

[AI Crawler] → [Your Platform Integration] → [AI Pages API] → [Optimization Engine]
                         ↓                        ↓
                  [Your Origin]              [GCS Cache]

Components

Component	Technology	Purpose
Platform Integration	JS/TS/PHP/Lua (varies)	Intercepts requests, detects crawlers
AI Pages API	Go	Handles auth, routing, caching
Optimization Engine	Python	Renders pages, applies AI optimizations
Cache	Google Cloud Storage	Stores optimized HTML
Analytics	Firestore	Logs all crawler activity

The platform integration is the lightweight code you deploy on your hosting platform (Cloudflare Worker, Vercel middleware, Netlify edge function, WordPress plugin, etc.). All integrations follow the same pattern and communicate with the same AI Pages API.

Request flow

Here's exactly what happens when an AI crawler visits your site:

1. Request arrives at your platform

Text

GET /products/running-shoes HTTP/1.1
Host: nike.com
User-Agent: GPTBot/1.0

2. Integration checks user-agent

The AI Pages integration examines the \User-Agent\ header:

JavaScript

const AI_CRAWLERS = [
  'GPTBot', 'ChatGPT-User', 'OAI-SearchBot',
  'ClaudeBot', 'Claude-User', 'Claude-SearchBot',
  'PerplexityBot', 'Google-Extended', 'Google-Agent', 'Googlebot-Extended',
  'cohere-ai', 'Applebot-Extended', 'Amazonbot',
  'Meta-ExternalAgent', 'ByteSpider', 'Baiduspider'
];

If human browser: Pass request to your origin server unchanged (adds <10ms latency)

If AI crawler: Forward to the AI Pages API with crawler details

3. AI Pages API receives request

Text

POST /v1/optimize
X-API-Key: prism_xxxxx
{
  "url": "https://nike.com/products/running-shoes",
  "pathname": "/products/running-shoes",
  "crawler": "GPTBot"
}

4. Cache check

AI Pages checks if an optimized version exists:

Cache key formula:

Text

SHA256(url + mode + features)

Cache hit: Return optimized HTML immediately (~100ms)

Cache miss: Trigger optimization

5. Optimization (cache miss only)

For new pages, the optimization engine:

1Renders the page - Uses headless browser to execute JavaScript
2Extracts content - Strips scripts, tracking, unnecessary markup
3Applies features:

- Structured data injection (JSON-LD) - Key facts extraction - FAQ generation - AI summary block - Entity recognition

1Compresses - Removes whitespace, optimizes HTML
2Stores in cache - Saves to GCS with 7-day TTL

On cache miss, response is:

JSON

{
  "status": "optimizing",
  "cache": "MISS",
  "message": "Serve original, optimization in progress"
}

The integration serves your original page while optimization happens in the background. Next crawler visit gets the optimized version.

6. Response served

Cache hit response:

JSON

{
  "optimizedHTML": "<html>...",
  "cache": "HIT",
  "is404": false
}

Response headers:

Text

X-Prism-Cache: HIT
X-Prism-Response-Time: 87ms

URL processing

Normalization

URLs are normalized for consistent caching:

Remove trailing slashes (except root)
Lowercase the hostname
Remove fragments (\#section\)
Keep only essential query params: \id\, \page\, \category\, \product\, \search\, \q\

Example:

Text

Input:  https://Nike.com/Products/?ref=nav&category=shoes#reviews
Output: https://nike.com/products?category=shoes

Filtered requests

These are automatically skipped (not optimized, not counted):

File extensions:

Text

.css, .js, .jpg, .jpeg, .png, .gif, .svg, .webp,
.ico, .woff, .woff2, .ttf, .pdf, .zip, .json, .xml

Paths:

Text

/api/*, /ws/*, /graphql, /_next/*, /static/*,
/assets/*, /fonts/*, /images/*, /favicon*

Caching strategy

Cache TTL

Optimized pages are cached for 7 days. After expiry, the next crawler visit triggers re-optimization.

Deduplication

Same customer + URL requests within 5 seconds are deduplicated (only counted once). This prevents bots that hit the same page repeatedly from inflating your usage.

Cache invalidation

Currently, caches expire naturally after 7 days. Manual invalidation coming soon.

The five optimization features

1. Pre-rendering

What it does:

Spins up headless Chrome
Loads your page with JavaScript
Waits for content to render
Captures final DOM state

Result: JavaScript-rendered content becomes static HTML that crawlers can read.

2. Structured data injection

What it does:

Analyzes page content
Detects page type (Article, Product, FAQ, etc.)
Generates JSON-LD schema markup
Injects into \<head>\

Example output:

HTML

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Nike Air Zoom Pegasus 40",
  "brand": {"@type": "Brand", "name": "Nike"},
  "offers": {
    "@type": "Offer",
    "price": "130.00",
    "priceCurrency": "USD"
  }
}
</script>

3. Key facts extraction

What it does:

Scans content for data points
Identifies prices, dates, percentages, statistics
Marks them with semantic HTML

Example:

HTML

<span itemtype="price">$130.00</span>
<span itemtype="date">Released March 2024</span>

4. FAQ generation

What it does:

Analyzes content for Q&A patterns
Generates relevant questions users might ask
Creates FAQPage schema

Example:

HTML

<section itemscope itemtype="https://schema.org/FAQPage">
  <div itemscope itemprop="mainEntity" itemtype="https://schema.org/Question">
    <h3 itemprop="name">Is the Pegasus 40 good for marathon training?</h3>
    <div itemscope itemprop="acceptedAnswer" itemtype="https://schema.org/Answer">
      <p itemprop="text">Yes, the Pegasus 40 is suitable for marathon training...</p>
    </div>
  </div>
</section>

5. Entity recognition

What it does:

Identifies named entities (brands, products, people, locations)
Marks them semantically
Creates entity relationships

Example:

HTML

<span itemtype="Brand">Nike</span> released the
<span itemtype="Product">Air Zoom Pegasus 40</span> in
<span itemtype="Date">March 2024</span>

Performance characteristics

Latency

Scenario	Latency Added
Human visitor	<10ms
AI crawler (cache hit)	~100ms
AI crawler (cache miss)	~2000ms (first time only)

Reliability

Automatic failover - On any error, serves your original site unchanged
Edge-native - Runs at the edge on platforms like Cloudflare, Vercel, and Netlify for minimal latency
Graceful degradation - If the API is unreachable, your site continues to work normally

Bandwidth

Optimized pages are typically 60-80% smaller than original pages due to removal of:

JavaScript bundles
Tracking scripts
CSS (or minimal inline)
Non-essential markup

Security

API key protection

API keys are hashed with SHA256 before storage
Keys are only transmitted over HTTPS
You can regenerate keys anytime

Data handling

AI Pages doesn't store your content long-term
Cached pages expire after 7 days
Analytics are tied to your account
No cross-customer data sharing

SEO safety

AI Pages never serves different content to:

Googlebot (for search rankings)
Bingbot
Any non-AI search crawler

This means your SEO is completely unaffected.

Limitations

What AI Pages can't optimize

Login-required pages - Crawlers can't authenticate
Real-time data - Cached content may be up to 7 days old
Interactive features - JavaScript apps become static snapshots
User-specific content - Personalization is lost

Known issues

Very large pages (>1MB) may timeout during optimization
Complex SPAs may not render completely
Some anti-bot systems may block our rendering engine

Next steps

Troubleshooting

Fix common AI Pages issues.

API Reference

Integrate AI Pages programmatically.

Next upAI Pages Troubleshooting

Introduction to Trakkr

Understand the core mission and capabilities of Trakkr for AI search visibility.

Quick Start Guide

Get up and running with Trakkr in under 5 minutes.

Was this helpful?