The Anatomy of an AI Citation
We crawled 1,465 AI-cited pages across 950 domains to understand what the most-cited pages on the web have in common.
We crawled 1,465 pages across 950 domains that ChatGPT, Perplexity, and Gemini actively cite, drawing from 28,033+ citation appearances. For each page, we extracted its schema markup, content structure, and technical metadata - then measured how each feature over- or under-indexes relative to web averages from the HTTP Archive.
These are correlational findings. They describe what AI-cited pages look like, not necessarily why they were chosen. Where sample sizes are small or confounds are present, we say so.
The FAQ Schema Effect
Pages with FAQPage schema average 45% more citation appearances than pages with no FAQ signal. Pages with FAQ content but no corresponding schema fall in between, suggesting the markup adds signal beyond the content pattern alone - though FAQ schema pages also tend to be substantially longer, which may partially explain the lift.
n=23 pages with FAQ schema. Early signal - sample size warrants caution.
FAQPage shows the strongest positive association with citation volume in the dataset. Most other schema types appear more often on cited pages but don't predict higher citation counts within them.
Schema Types on Cited Pages
vs ~38.5% web average (Web Almanac 2024)
AI-cited pages are nearly twice as likely to have structured data as the web average. The standout types are Person (author attribution), ImageObject, and NewsArticle - each appearing 8-9x more frequently on cited pages than across the web at large.
Web averages from HTTP Archive / Web Almanac 2024. Over-representation indicates the kind of pages AI models cite, not a direct causal effect. Of all types measured, only FAQPage independently correlates with higher citation frequency (Section 01).
18.9% cited vs 2.0% web
21.4% cited vs 2.4% web
10.4% cited vs 1.2% web
2.4% cited vs 0.3% web
1.3% cited vs 0.2% web
37.7% cited vs 7.3% web
29.3% cited vs 5.8% web
8.1% cited vs 1.7% web
4.4% cited vs 1.0% web
33.0% cited vs 7.7% web
31.5% cited vs 7.6% web
24.4% cited vs 6.5% web
Light Schema Outperforms Heavy
Pages with light schema implementation are cited more frequently than pages with heavy, complex markup. Beyond a modest threshold, additional structured data shows diminishing - and eventually negative - returns.
Focus beats thoroughness. The lightest schema tier consistently earns the most citations in this dataset. Heavier markup shows no benefit - though whether this reflects a preference by AI models or simply the characteristics of high-performing pages is an open question.
The Citation Blueprint
Ten on-page features compared between the top 10% most-cited pages and the bottom 50%. The differences are narrower than you might expect - these are structural properties, not external signals like backlinks or domain authority.
Schema adoption shows the clearest separation between tiers. Content structure features - headings, FAQ patterns, how-to content - are remarkably similar, suggesting the baseline quality among AI-cited pages is already high.
Most-Cited Pages
The 15 most frequently cited pages in the dataset. Note both the pattern and the exceptions: most have focused schema, but several of the highest-ranked pages have none at all.
Key Takeaways
Schema is infrastructure, not advantage
68% of AI-cited pages have structured data - nearly double the web average. But most schema types don't predict citation volume; they describe the kind of pages AI happens to cite. Having schema is common among cited pages, not a differentiator within them.
FAQ schema is the exception - with caveats
Pages with FAQPage schema average 45% more citations than pages with no FAQ signal. But the sample is small (n=23), and these pages also tend to be substantially longer. A real association, not yet a proven cause.
Focused markup beats comprehensive markup
Pages with light schema (1-20 fields) earn the most citations in the dataset. Heavier implementations show diminishing returns. Schema complexity doesn't help; content quality might.
Content depth is the likely foundation
AI-cited pages average 2,289.6 words - 3x the typical web page. When comparing the top 10% to the bottom 50% of cited pages, structural differences are modest. Substance appears to matter more than any single on-page signal.
Methodology
Data Sources
Top-cited URLs from Trakkr's citation tracking system, drawn from 28,000+ citation appearances across 950 domains. Each URL was crawled live to extract JSON-LD structured data, content characteristics (word count, headings, lists, tables, FAQ patterns), and technical metadata.
Web average benchmarks come from the HTTP Archive / Web Almanac 2024. The sample skews toward B2B, SaaS, and DTC brands that use Trakkr - findings are most directly applicable to those verticals.
Limitations
Are your pages built to get cited?
See where your pages stand on the metrics that matter: schema markup, FAQ structure, content depth, and citation frequency across ChatGPT, Perplexity, and Gemini.
Start monitoring freeNo credit card required
See how your brand performs in AI search
