Does schema markup help you get cited by AI?

Pages with FAQ schema markup average 45% more citation appearances than pages with no FAQ signal, though the sample is small (n=23) and confounded by content length. Overall, 68% of AI-cited pages have schema markup, nearly double the web average of 38.5%.

What schema types do AI models prefer?

Person (author attribution) schema appears 9.4x more often on AI-cited pages than the web average, followed by ImageObject (8.9x) and NewsArticle (8.7x). This over-representation reflects the kind of pages AI models cite, not a proven causal effect. Of all types measured, only FAQPage shows a positive association with citation volume.

How long should content be to get cited by AI?

AI-cited pages average 2,290 words - roughly 3x the typical web page. 78% of cited pages exceed 1,000 words, with the densest content between 1,000 and 3,000 words.

Research

Isometric illustration of the layers of an AI-cited page

Study 008

The Anatomy of an AI Citation

We crawled 1,465 AI-cited pages across 950 domains to understand what the most-cited pages on the web have in common.

1,392

AI-cited pages crawled

64%

have schema markup

+42%

citation lift from FAQ schema

more words than web average

Last updated · Apr 16, 2026

We crawled 1,465 pages across 950 domains that ChatGPT, Perplexity, and Gemini actively cite, drawing from 28,033+ citation appearances. For each page, we extracted its schema markup, content structure, and technical metadata - then measured how each feature over- or under-indexes relative to web averages from the HTTP Archive.

These are correlational findings. They describe what AI-cited pages look like, not necessarily why they were chosen. Where sample sizes are small or confounds are present, we say so.

[01]

The FAQ Schema Effect

/ The strongest signal in the dataset

FAQ Schema Effect

+45% more citations

Pages with FAQPage schema average 45% more citation appearances than pages with no FAQ signal. Pages with FAQ content but no corresponding schema fall in between, suggesting the markup adds signal beyond the content pattern alone - though FAQ schema pages also tend to be substantially longer, which may partially explain the lift.

n=23 pages with FAQ schema. Early signal - sample size warrants caution.

FAQPage shows the strongest positive association with citation volume in the dataset. Most other schema types appear more often on cited pages but don't predict higher citation counts within them.

Avg Citations by FAQ Signal

FAQ Schema + FAQ Content36.9

n=23

FAQ Content Only27.2

n=161

No FAQ Signal25.4

n=269

[02]

Schema Types on Cited Pages

/ 12 types ranked by how much AI over-indexes them

Schema Adoption

68%of AI-cited pages have schema markup

vs ~38.5% web average (Web Almanac 2024)

AI-cited pages are nearly twice as likely to have structured data as the web average. The standout types are Person (author attribution), ImageObject, and NewsArticle - each appearing 8-9x more frequently on cited pages than across the web at large.

Web averages from HTTP Archive / Web Almanac 2024. Over-representation indicates the kind of pages AI models cite, not a direct causal effect. Of all types measured, only FAQPage independently correlates with higher citation frequency (Section 01).

Schema Type Lift vs Web Average

Person9.4x

18.9% cited vs 2.0% web

ImageObject8.9x

21.4% cited vs 2.4% web

NewsArticle8.7x

10.4% cited vs 1.2% web

SoftwareApplication8.0x

2.4% cited vs 0.3% web

Service6.5x

1.3% cited vs 0.2% web

BreadcrumbList5.2x

37.7% cited vs 7.3% web

WebPage5.1x

29.3% cited vs 5.8% web

BlogPosting4.8x

8.1% cited vs 1.7% web

ItemList4.4x

4.4% cited vs 1.0% web

WebSite4.3x

33.0% cited vs 7.7% web

Organization4.1x

31.5% cited vs 7.6% web

Article3.8x

24.4% cited vs 6.5% web

[03]

Light Schema Outperforms Heavy

/ Diminishing returns on schema complexity

30.5

avg citations for light schema pages

Pages with light schema implementation are cited more frequently than pages with heavy, complex markup. Beyond a modest threshold, additional structured data shows diminishing - and eventually negative - returns.

Focus beats thoroughness. The lightest schema tier consistently earns the most citations in this dataset. Heavier markup shows no benefit - though whether this reflects a preference by AI models or simply the characteristics of high-performing pages is an open question.

No Schema

24.1

n=1461,843.4 avg words

Light

30.5

n=1352,551.6 avg words

Medium

26.6

n=892,310 avg words

Rich

24.8

n=722,646.6 avg words

Very Rich

23.7

n=122,478.6 avg words

[04]

The Citation Blueprint

/ What separates the most-cited pages from the rest

Ten on-page features compared between the top 10% most-cited pages and the bottom 50%. The differences are narrower than you might expect - these are structural properties, not external signals like backlinks or domain authority.

Metric

Top 10%

Bottom 50%

Delta

Has Any Schema

80.0%

65.6%

+14.4%

Article Schema

37.8%

23.3%

+14.5%

FAQ Schema

11.1%

5.3%

+5.8%

Person Schema

17.8%

19.4%

-1.6%

Word Count

2521.1

2304.7

+216.4

Total Headings

33.7

31.4

+2.3

List Items

146.9

120.3

+26.6

Has Tables

40.0%

28.2%

+11.8%

Has FAQ Content

42.2%

38.3%

+3.9%

Has How-To Content

73.3%

70.0%

+3.3%

Schema adoption shows the clearest separation between tiers. Content structure features - headings, FAQ patterns, how-to content - are remarkably similar, suggesting the baseline quality among AI-cited pages is already high.

[05]

Most-Cited Pages

/ The 15 top-performing URLs in the dataset

The 15 most frequently cited pages in the dataset. Note both the pattern and the exceptions: most have focused schema, but several of the highest-ranked pages have none at all.

Domain

Citations

Words

Schema Types

softwarefinder.com

218

2,937

Corporation

rankmyagent.com

174

1,461

FAQPageRealEstateAgentItemList

collegenet.com

123

808

WebPageBreadcrumbListVideoObject

dotcom-monitor.com

111

5,785

BreadcrumbListPersonWebSite+4 more

runnersworld.com

3,834

NewsArticleItemList

g-co.agency

2,558

None

iiba.org

2,806

None

milanote.com

1,111

HowTo

offers.hubspot.com

553

None

dash.dropbox.com

1,474

MobileApplicationSoftwareApplicationOrganization+2 more

nokia.com

1,771

BreadcrumbList

ehrinpractice.com

1,832

None

skyquestt.com

2,993

WebPageItemList

readycontacts.com

1,857

PersonArticle

[06]

Key Takeaways

/ What we think this means

Schema is infrastructure, not advantage

68% of AI-cited pages have structured data - nearly double the web average. But most schema types don't predict citation volume; they describe the kind of pages AI happens to cite. Having schema is common among cited pages, not a differentiator within them.

FAQ schema is the exception - with caveats

Pages with FAQPage schema average 45% more citations than pages with no FAQ signal. But the sample is small (n=23), and these pages also tend to be substantially longer. A real association, not yet a proven cause.

Focused markup beats comprehensive markup

Pages with light schema (1-20 fields) earn the most citations in the dataset. Heavier implementations show diminishing returns. Schema complexity doesn't help; content quality might.

Content depth is the likely foundation

AI-cited pages average 2,289.6 words - 3x the typical web page. When comparing the top 10% to the bottom 50% of cited pages, structural differences are modest. Substance appears to matter more than any single on-page signal.

[07]

Methodology

/ Sources, Sample, and Honest Gaps

Data Sources

Citation Data

Top-cited URLs from Trakkr's citation tracking system, drawn from 28,000+ citation appearances across 950 domains. Each URL was crawled live to extract JSON-LD structured data, content characteristics (word count, headings, lists, tables, FAQ patterns), and technical metadata.

Benchmarks

Web average benchmarks come from the HTTP Archive / Web Almanac 2024. The sample skews toward B2B, SaaS, and DTC brands that use Trakkr - findings are most directly applicable to those verticals.

Limitations

Small FAQ Sample (n=23). The FAQ schema finding is an early, actionable signal rather than a definitive causal claim. Larger samples will sharpen the estimate.

No Non-Cited Control Group. We compare cited pages to general web averages, not to comparable pages that were not cited. Differences may reflect page quality rather than features AI specifically selects for.

Presence, Not Quality. Schema adoption percentages measure whether structured data exists on a page, not whether it is correctly implemented or validated.

Are your pages built to get cited?

See where your pages stand on the metrics that matter: schema markup, FAQ structure, content depth, and citation frequency across ChatGPT, Perplexity, and Gemini.

Start monitoring free

Free trial included