Tutorial: AI SEO Log Analysis and Schema With ChatGPT

AI Log File Analysis, JSON-LD Schema, and Multimodal Optimization for AI SEO

A client site was stuck with flat AI traffic while every major LLM was accelerating user adoption. Three techniques — server log analysis via ChatGPT, JSON-LD schema generation, and multimodal content optimization — changed that, producing a 1,400% increase in AI traffic and 164 newly ranking AI Overview keywords. Work through these steps and you’ll know exactly which pages AI bots ignore on your site, how to tell those bots what your content means, and how to give them richer material to cite.

Google Traffic vs AI Traffic: the divergence that makes AI SEO a separate discipline.

Technique 1: Server Log File Analysis via ChatGPT

Log into your hosting control panel and locate the log directory — it’s usually named logs, access_logs, or similar. Download the most recent log file to your machine.
Open ChatGPT and upload the log file. Use this prompt to initialize the analysis session:

“I have attached log files from my website server. Please analyze the logs focusing on Googlebot and AI crawlers such as GPTBot, ClaudeBot, and similar. Identify all hits from user agents containing any of the following keywords: Google, GPTBot, ClaudeBot. Once you’ve analyzed this, I will ask you to perform a series of tasks.”

ChatGPT will return a summary of hit rates broken down by bot.

Opening ChatGPT 4o to begin server log file analysis.

Prompt ChatGPT for your lowest-crawled pages:

“Provide a list of the 10 pages that receive the fewest hits from AI bots and Google, and create a visual diagram.”

These are the pages AI is effectively skipping. For a plant-pot client site, the culprit was a high-value sales page with only one internal link pointing to it. The fix: add internal links to that page from the highly-crawled pages you’ll identify next.

The exact ChatGPT prompt to identify pages AI bots are under-crawling from your server log file.

Run the companion prompt to surface your highest-crawled pages:

“Provide a list of the 10 pages that receive the most hits, and create a visual diagram.”

Use these pages as internal-link sources pointing toward your under-crawled priority pages.

The companion ChatGPT prompt to find your highest-crawled pages — essential for crawl budget benchmarking.

Prompt ChatGPT to surface crawl errors:

“Highlight any crawl errors from these bots and flag anything that looks unusual or worth fixing.”

ChatGPT will produce a table of 404s hit by GPTBot and ClaudeBot, grouped by page and date, along with a remediation section recommending whether to restore the page or implement a 301 redirect.

ChatGPT surfaces 404s hit by ClaudeBot and GPTBot — and tells you exactly which pages to fix or redirect.

Run one final log prompt for broader pattern detection:

“Provide any additional insights or patterns you observe, such as bots missing key commercial pages, pages being crawled unexpectedly often, and sudden spikes in crawl activity.”

This surfaces secondary issues like page-speed problems and content-quality signals that affect AI crawl behavior.

Technique 2: JSON-LD Schema Markup Generation

For each page you want to optimize, identify the correct schema type based on its content: Article for blog posts, HowTo for instructional pages, FAQPage for Q&A content, and so on. Schema.org lists the full taxonomy — you don’t need to memorize it.
Prompt ChatGPT to generate the markup. For an FAQ page, the prompt structure looks like this:

“Please generate JSON-LD FAQPage schema for my web page. Below is the information for each question and answer: [Question 1], [Answer 1], [Question 2], [Answer 2]…”

Supply the actual content inline — questions and answers, steps, author name, organization, publication date — and ChatGPT returns ready-to-paste JSON-LD.

Warning: this step may differ from current official documentation — see the verified version below.

Place the generated markup inside a <script type="application/ld+json"> tag in the <head> section of the page’s HTML. Most CMS platforms support this via a plugin if you don’t have direct HTML access.
Validate the markup using Google’s Rich Results Test before publishing. A passing result shows detected schema types with zero errors and zero warnings.

Validating JSON-LD schema markup: four schema types detected with zero errors using Google's Rich Results testing tool. — Validating JSON-LD schema markup: four schema types detected with zero errors using Google’s Rich Results testing tool.

Google's core rule for structured data: every schema property must reflect what's actually on the page. — Google’s core rule for structured data: every schema property must reflect what’s actually on the page.

Technique 3: Multimodal Content Optimization

Add high-quality images and video embeds to content pages. AI systems are more likely to cite pages that give them multiple signal types — text, visuals, and structured media.
Write descriptive alt text for every image. Describe what is actually in the image; don’t pad the attribute with target keywords.
Rename image files to clearly describe their subject before uploading. Generic filenames like IMG_4821.jpg provide no semantic signal.
Replace any image-based tables (screenshots of spreadsheets, infographics containing tabular data) with actual HTML tables. LLMs parse structured HTML directly; they cannot reliably extract data from rasterized images.
Add a written transcript to any page that contains an embedded video. The transcript gives AI a citable text layer tied to the video content — increasing the probability that both the video and the surrounding article appear as a source in AI Overviews.

Battersea's multimodal content pattern: embedded video + accessible transcript = more AI-indexable signals on a single page. — Battersea’s multimodal content pattern: embedded video + accessible transcript = more AI-indexable signals on a single page.

How does this compare to the official docs?

The techniques above follow one practitioner’s implementation — but Google’s own documentation on structured data requirements and multimodal content signals tells a more precise story about what’s officially supported, what’s deprecated, and where the guardrails actually sit.

Here’s What the Official Docs Show

The video’s three-technique framework holds up well in its core recommendations, and the verified steps align closely with what the official documentation describes. What follows layers in the prerequisites, format options, and scope caveats the docs surface — filling the gaps rather than retracing ground the tutorial already covered accurately.

Technique 1: Server Log File Analysis via ChatGPT

Steps 1, 3–8 — Log file download, ChatGPT prompting sequence, crawl error analysis, and pattern detection

No official documentation was found for these steps —
proceed using the video’s approach and verify independently.

One important addition that is documented: Google Search Central explicitly recommends Google Search Console as the standard tool for crawl monitoring and performance analysis. If you want a baseline alongside your log file work — crawl coverage data, indexing alerts, traffic trends — Search Console belongs in this workflow.

📄 Google Search Central recommends Search Console for monitoring crawl coverage, clicks, and ranking data — a natural complement to manual log file analysis.

Step 2 — Uploading the log file to ChatGPT

The video’s approach here matches the current docs exactly on the mechanics: the + button in the prompt input bar is present and functions as described.

One prerequisite the tutorial skips: ChatGPT’s own interface states plainly — “Log in to get answers based on saved chats, plus create images and upload files.” File upload requires a logged-in account. The tutorial does not mention this. A free ChatGPT account is sufficient to test, but you must be signed in before the upload option becomes functional. The interface also surfaces a model selector in the header; the tutorial does not specify which model to use for log analysis.

ChatGPT homepage (logged-out state) showing the '+' file upload button and the sidebar notice that file upload requires a logged-in account. — 📄 ChatGPT homepage (logged-out state) showing the ‘+’ file upload button and the sidebar notice that file upload requires a logged-in account.

Technique 2: JSON-LD Schema Markup Generation

Steps 9–11 — Schema type selection, markup generation, and <script> tag placement

The video’s approach here matches the current docs exactly. Schema.org (V30.0, 2026-03-19) confirms JSON-LD as a valid, recognized structured data encoding, and Google Search Central confirms that structured data makes pages eligible for “certain features” in Search.

One clarification worth holding onto: Schema.org officially supports three encodings — RDFa, Microdata, and JSON-LD. The tutorial presents JSON-LD as the only option. JSON-LD remains the most widely recommended format and is what Google’s own documentation examples typically show, but RDFa and Microdata are fully valid alternatives if your CMS or stack makes them a better fit.

📄 Schema.org homepage (V30.0, 2026-03-19) confirming JSON-LD alongside RDFa and Microdata as the three supported structured data encodings.

Google’s official framing of why this matters is slightly narrower than the tutorial implies. Search Central describes structured data as making pages eligible for “certain features” broadly — AI Overviews is not named as the specific target. That’s a useful distinction: the SEO case for structured data is strong and well-documented, but official guidance doesn’t draw a direct line from schema markup to AI Overviews citation probability.

Google Search Central guidance confirming structured data creates eligibility for 'certain features' — phrased broadly, not as AI Overviews optimization specifically. — 📄 Google Search Central guidance confirming structured data creates eligibility for ‘certain features’ — phrased broadly, not as AI Overviews optimization specifically.

Technique 3: Multimodal Content Optimization

Steps 12–13 — High-quality images and descriptive alt text

The video’s approach here matches the current docs exactly. Google Search Central states: “Use high quality images and describe them — By adding more context around images, results can become much more useful, which can lead to higher quality traffic to your site.” Image file naming (step 13) follows naturally from this guidance even though it isn’t called out separately.

Google Search Central official guidance on high-quality images with descriptive context — the clearest doc-grounded support for the tutorial's multimodal steps. — 📄 Google Search Central official guidance on high-quality images with descriptive context — the clearest doc-grounded support for the tutorial’s multimodal steps.

Step 14 — AI traffic and AI Overviews citation rationale

The tutorial’s source for AI Overviews behavior is a May 2023 Google Keyword blog post — Elizabeth Reid’s Search Labs launch announcement — not current AI Overviews product documentation. As of March 28, 2026, no official Google documentation in the verified screenshots describes how AI Overviews selects sources or what content signals influence citation selection. The directional premise (AI Search includes links to source pages) is confirmed by that blog post, but the specific optimization claims are not validated by current guidance.

May 2023 Google blog confirming AI Search results include 'links to dig deeper' — the citation mechanism the tutorial targets, sourced from a Search Labs announcement rather than current AI Overviews documentation. — 📄 May 2023 Google blog confirming AI Search results include ‘links to dig deeper’ — the citation mechanism the tutorial targets, sourced from a Search Labs announcement rather than current AI Overviews documentation.

Steps 15–16 — HTML tables and video transcripts

No official documentation was found for these steps —
proceed using the video’s approach and verify independently.

Useful Links

ChatGPT — The ChatGPT web interface where file upload (requiring a logged-in account) and log file analysis prompting takes place.
Google Search Central — Google’s official hub for SEO documentation, structured data guidance, and Search Console recommendations.
Schema.org — The cross-industry structured data vocabulary (V30.0) maintained by Google, Microsoft, Yahoo, and Yandex, documenting JSON-LD, RDFa, and Microdata encodings.
How Google is improving Search with Generative AI — May 2023 Google Keyword blog post by Elizabeth Reid announcing early generative AI Search experiments via Search Labs; the source behind the tutorial’s AI Overviews citation claims.

What's Your Reaction?