Schema Doesn’t Boost AI Citations (New Ahrefs Study)
A May 2025 Ahrefs study tracked 1,885 pages before and after adding JSON-LD schema markup — and found no statistically significant citation lift on Google AI Overviews, Google AI Mode, or ChatGPT. After completing this walkthrough, you’ll understand exactly how the study was designed, why the correlation data that kicked it off was misleading, and what the findings actually mean for your schema implementation decisions.

- Ahrefs began by pulling 6 million URLs from its AI citation dataset and found that pages cited by AI were nearly three times more likely to carry JSON-LD schema than non-cited pages. That gap is the kind of number that fuels conference slides and LinkedIn carousels — and it initially looked like strong evidence that schema drives AI visibility.

-
The research team recognized immediately that the correlation could be explained by a confounding variable: schema markup tends to live on technically sophisticated, well-maintained sites — the same sites that produce stronger content, earn more links, and build more topical authority. Schema might be riding the wave of every other positive signal rather than creating one of its own.
-
To isolate schema’s actual effect, Ahrefs ran a second study. Data scientist Shea pulled millions of URLs, retrieved HTML crawl history, and flagged every instance where a page’s JSON-LD presence transitioned from
falsetotrue. That process identified 1,885 pages that added<script type="application/ld+json">markup between August 2025 and March 2026.

-
Each of the 1,885 treated pages was matched against control pages from different domains that shared similar pre-period citation levels and had never added JSON-LD. The matching step is what separates this from a simple before/after comparison — it neutralizes platform-level trends (AI Overviews contracting, AI Mode expanding) that would otherwise contaminate the results.
-
Citations were measured across Google AI Overviews, Google AI Mode, and ChatGPT in the 30 days before and 30 days after each page’s schema addition date. Using a 30-day window on each side gave enough data to smooth noise while keeping the measurement period tight enough to attribute changes to the schema event.
-
Ahrefs applied four separate statistical tests — including a matched difference-in-differences (DiD) analysis — to validate that any conclusion would hold under scrutiny. The DiD method compares the change in the treated group against the change in the control group, isolating the marginal effect of adding schema.
-
All four tests returned the same answer. Google AI Overviews: -4.6% (small but statistically significant relative to controls, likely reflecting a pre-existing downward trend in that content category). Google AI Mode: +2.4%. ChatGPT: +2.2%. The two positive figures were statistically indistinguishable from zero — random noise across thousands of URLs.

- The study’s practical recommendation: add schema only when your target SERPs are already displaying schema-driven rich result features for keywords you care about. The method is straightforward — screenshot the SERP, run it through an LLM, and ask what’s generating those features. If the answer is schema, implement it. If not, your time is better spent elsewhere.

One important caveat from the study: all 1,885 pages were already receiving substantial AI citations (100+ daily AIO citations). Schema’s role for pages with zero AI visibility remains untested by this data.

How does this compare to the official docs?
The Ahrefs findings rest on real data, but Google’s own schema documentation makes specific claims about which markup types qualify pages for rich result features — and that guidance carries direct implications for when implementation is worth the effort.
Here’s What the Official Docs Show
Act 1 covers the Ahrefs study methodology and its null finding accurately — the official documentation adds context that sharpens a few edges without changing the conclusion.
Step 1 — The correlation finding in Ahrefs’ 6-million-URL dataset
No official documentation was found for this step — proceed using the video’s approach and verify independently.

Step 2 — Identifying the confounding variable
No official documentation was found for this step — proceed using the video’s approach and verify independently.
Step 3 — Isolating the 1,885 treated pages
No official documentation was found for this step — proceed using the video’s approach and verify independently.

One doc-layer note the tutorial doesn’t surface: JSON-LD’s official specification purpose — per json-ld.org — is semantic web data interoperability across programming environments and REST services, not search or AI optimization. That framing is entirely consistent with the study’s null result. Treating schema as an AI citation lever was always a hypothesis, not a design feature of the format.

Step 4 — Matching treated pages against controls
No official documentation was found for this step — proceed using the video’s approach and verify independently.
Step 5 — Measuring citations across Google AI Overviews, Google AI Mode, and ChatGPT
The video’s approach here matches the current docs exactly.

Two additions worth noting. First, Ahrefs Brand Radar tracks Perplexity as a fourth AI citation platform alongside AI Overviews and ChatGPT — the study covers only three platforms, leaving Perplexity unaddressed. If you’re running your own citation analysis, Perplexity is a measurable surface Ahrefs already has data for.

Second, ChatGPT’s “Deep research” mode — a web-connected, citation-generating feature — is visible on chatgpt.com, but the study does not specify which ChatGPT mode Ahrefs queried. Standard chat and Deep research have meaningfully different citation behaviors, and that distinction is unresolved in the methodology.

Step 6 — Four statistical tests including difference-in-differences
No official documentation was found for this step — proceed using the video’s approach and verify independently.
Step 7 — The results: -4.6%, +2.4%, +2.2%
The video’s approach here matches the current docs exactly.

The specific percentage figures and sample size are not independently verifiable from available screenshots — they exist in the Ahrefs blog post, not a product specification. Treat them as study outputs rather than platform-documented claims.
Step 8 — Practical recommendation: implement schema only when rich results are already showing
No official documentation was found for this step — proceed using the video’s approach and verify independently.

One forward-looking note from the docs: the JSON-LD specification is expanding into YAML-LD and CBOR-LD variants not addressed in the study. Their detection profiles may differ from standard JSON-LD, which is relevant if you plan to replicate this analysis in the future.
Useful Links
- Ahrefs — AI Marketing Platform Powered by Big Data — Ahrefs’ homepage, confirming its active AI citation tracking infrastructure via Brand Radar and its “Brand & AI Search” product suite.
- JSON-LD — JSON for Linked Data — The official JSON-LD specification site, documenting the format’s semantic web purpose, conforming implementations across 10+ languages, W3C governance, and emerging format extensions.
- Google — Google’s homepage, confirming Google AI Mode’s general availability as a distinct, separately accessed search surface independent of Google AI Overviews.
- ChatGPT — ChatGPT’s public interface, confirming the platform’s availability as a citation surface and the presence of “Deep research” as a web-connected, citation-generating mode.
0 Comments