3 weeks ago 2 weeks ago

Tutorial: The cats.txt SEO Hoax That Exposed LLMs

Mark Williams-Cook fabricated a web standard called cats.txt, published it on a convincing technical site, and watched every major AI crawler treat it as legitimate within days. This tutorial breaks down the experiment's mechanics, what official documentation confirms about LLM crawling behavior and JSON-LD interpretation, and what the feedback loop means for your GEO strategy.

by marketingagent.io 3 weeks ago2 weeks ago

5views

The cats.txt SEO Hoax That Fooled AI (And What It Reveals About LLMs)

Mark Williams-Cook fabricated a web standard from scratch, published it on a convincing technical site, and watched every major AI crawler on the internet treat it as legitimate — within days. This breakdown of his cats.txt experiment gives you a clear, reproducible understanding of why LLMs confidently repeat invented information, and what that means for how you approach GEO and AI visibility today.

Williams-Cook registered cats.txt.org and wrote a technical specification document in the dry, authoritative style of a real web standard — the kind you’d associate with RFC documents or the robots.txt specification. The site positioned cats.txt as a “comprehensive guide to your website’s content, specifically generated for chatbots like Gemini, ChatGPT, Claude, Perplexity, and others.” No hedging, no humor — just confident technical prose that pattern-matched against thousands of real spec pages in training data.

The fake cats.txt spec defined itself as a 'comprehensive guide' for every major LLM — written to sound indistinguishable from a real web standard. — The fake cats.txt spec defined itself as a ‘comprehensive guide’ for every major LLM — written to sound indistinguishable from a real web standard.

The fabricated specification defined exact requirements for implementing cats.txt: the file must live at /.well-known/cats.txt, must not be blocked by robots.txt for any crawler, must be linked from the homepage for discovery, and must be a plain text or markdown file. The final requirement — that the file reference at least one cat image — was buried in otherwise deadpan technical language.

Warning: this step may differ from current official documentation — see the verified version below.

The fake spec even required at least one cat image — a tell buried in otherwise convincing robots.txt-style technical language.

Williams-Cook published a LinkedIn post framing cats.txt as “the missing standard for SEO and GEO,” complete with a call to action for site owners to implement it. The post went out, and the crawlers followed almost immediately.

The LinkedIn post that launched the hoax: Williams-Cook framed cats.txt as 'the missing standard for SEO and GEO' — and the internet took it seriously. — The LinkedIn post that launched the hoax: Williams-Cook framed cats.txt as ‘the missing standard for SEO and GEO’ — and the internet took it seriously.

Server logs from cats.txt.org showed requests from every significant crawler in the ecosystem: Googlebot, GPTBot, ClaudeBot, PerplexityBot, Bingbot, Applebot, AhrefsBot. The logs weren’t a trickle — they confirmed that fabricating a plausible-looking standard, placing it at a crawlable URL, and writing about it authoritatively was sufficient to trigger broad automated ingestion.

The server logs don't lie: every major LLM crawler and search bot crawled cats.txt within days — treating a fabricated standard as legitimate. — The server logs don’t lie: every major LLM crawler and search bot crawled cats.txt within days — treating a fabricated standard as legitimate.

Before the experiment gained wide attention, Williams-Cook queried ChatGPT directly: Can cats.txt help me rank in search or LLMs? ChatGPT confirmed it could. At the Athens SEO Conference, he showed the exchange on stage — the model citing the fabricated standard as a legitimate ranking signal based on what it had retrieved from the web.

On stage at Athens SEO, Williams-Cook shows ChatGPT's own words: it called cats.txt 'a thing that helps' for search and LLM discovery. — On stage at Athens SEO, Williams-Cook shows ChatGPT’s own words: it called cats.txt ‘a thing that helps’ for search and LLM discovery.

A parallel experiment reinforced the finding. Williams-Cook invented custom JSON-LD schema types for a fictitious company called Duck Yeah T-Shirts. When Perplexity was asked questions about the page, it returned fabricated structured data — including a made-up address — as confidential fact, with no indication the source was invented.

Perplexity confidently returns a fabricated address — '77 The Muddy Bank, South Pondshire, DK99 YEA, United Queendom' — treating invented structured data as authoritative fact. — Perplexity confidently returns a fabricated address — ’77 The Muddy Bank, South Pondshire, DK99 YEA, United Queendom’ — treating invented structured data as authoritative fact.

The experiment’s core mechanism is now well-documented: LLMs model what people say is true, not what is true. Consensus-looking content becomes ground truth. Once enough pages describe something as real — whether cats.txt or a fictional company’s schema — the model explains it confidently, people cite the explanation as evidence, and the feedback loop closes. The signal that drives LLM visibility is indistinguishable from the signal that drove PageRank: volume, authority-mimicry, and repetition.

How does this compare to the official docs?

The cats.txt experiment was designed to exploit gaps in how LLMs validate sourcing — but the official guidance from major AI platforms tells a different story about what actually shapes model behavior and how structured data is (and isn’t) processed.

Here’s What the Official Docs Show

The video’s walkthrough holds up where documentation exists to check it — this section layers in what official sources confirm, and flags clearly the steps where documentation coverage is absent.

Step 1 — The fake spec’s design

No official documentation was found for this step — proceed using the video’s approach and verify independently.

📄 llmstxt.org — the legitimate /llms.txt proposal (Jeremy Howard, September 3, 2024) that cats.txt modeled its format and placement convention after

Step 2 — Spec requirements: file path, robots.txt, format

The video’s approach here matches the current docs exactly. The robotstxt.org reference confirms robots.txt is a genuine, crawler-enforced mechanism — the cats.txt spec’s instruction to leave the file unblocked was technically meaningful, not decorative. The llmstxt.org proposal documents the identical root-path, markdown-format convention, showing why the fake spec looked credible: it patterned itself against a real emerging standard that LLMs had already encountered in training data.

robotstxt.org confirming robots.txt as a real crawler-control standard — the mechanism cats.txt's spec required implementers to leave unblocked — 📄 robotstxt.org confirming robots.txt as a real crawler-control standard — the mechanism cats.txt’s spec required implementers to leave unblocked

📄 llmstxt.org Proposal section showing markdown format and root-path placement — the conventions cats.txt mimicked

Step 3 — Crawlers in the server logs

The video’s approach here matches the current docs exactly. Perplexity’s official API documentation confirms PerplexityBot is a legitimate, active real-time web crawler. Its stated core capability — “real-time, web-wide research and Q&A” — explains precisely why a newly published cats.txt file would be indexed and surfaced in answers within days.

📄 Perplexity API documentation (docs.perplexity.ai) confirming PerplexityBot as an active real-time crawler

Step 4 — ChatGPT confirms cats.txt

No official documentation was found for this step — proceed using the video’s approach and verify independently.

As of May 31, 2026, the ChatGPT screenshots captured for this post show only the unauthenticated chat.openai.com landing page — no query or response is present. The exchange Williams-Cook demonstrated at Athens SEO cannot be confirmed or contradicted from these images.

📄 ChatGPT unauthenticated landing page (chat.openai.com) — no cats.txt query or response captured

Step 5 — Google AI Overview surfaces cats.txt

No official documentation was found for this step — proceed using the video’s approach and verify independently.

One update worth flagging: Google’s current interface shows an “AI Mode” button in the search bar — the “AI Overview” label from the video may reflect an earlier surface of the same feature.

Google homepage with current 'AI Mode' button — empty state, no cats.txt results visible — 📄 Google homepage with current ‘AI Mode’ button — empty state, no cats.txt results visible

Step 6 — Duck Yeah T-Shirts and Perplexity

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Step 7 — JSON-LD as a real standard

The video’s approach here matches the current docs exactly. JSON-LD is formally governed by a W3C Working Group, implemented across 11+ programming languages, and built on a defined @context and @type vocabulary system. Fabricated @type values like those in the Duck Yeah experiment are syntactically valid JSON-LD — they carry no standing in any official vocabulary, but an LLM has no basis to distinguish them.

📄 json-ld.org confirming JSON-LD as a W3C-backed linked data format with a defined @type system

📄 json-ld.org ecosystem overview showing W3C Working Group governance — the institutional legitimacy fabricated schema types parasitically imitate

Step 8 — The feedback loop

No official documentation was found for this step — proceed using the video’s approach and verify independently.

One documentary addition: the Perplexity API overview includes a banner directing AI agents to its llms.txt documentation index — confirming that the /llms.txt convention cats.txt imitated is recognized and in active use by the very platforms the experiment targeted.

📄 llmstxt.org Format section showing the markdown structure and root-path convention cats.txt mirrored in its fake specification

Useful Links

The /llms.txt file – llms-txt — Documents the legitimate root-path, markdown-format community proposal (Jeremy Howard, September 3, 2024) that cats.txt modeled its fake specification after.
JSON-LD – JSON for Linked Data — Official homepage for the W3C-backed JSON-LD standard, including the @context and @type system the Duck Yeah T-Shirts experiment exploited.
Overview – Perplexity — Perplexity’s API documentation confirming PerplexityBot as an active real-time web crawler and the platform’s own recognition of the /llms.txt convention.
The Web Robots Pages — Community reference for the robots.txt standard that cats.txt’s spec required implementers to leave unblocked so major crawlers could access the file.
ChatGPT — Consumer ChatGPT interface; the specific cats.txt exchange from step 4 was not captured in the available screenshots and cannot be verified here.
Google — Google’s current search homepage, now featuring an “AI Mode” button that may represent an evolution of the “AI Overview” surface referenced in the video.