Tutorial: Googlebot’s 15MB Crawl Limit & Page Weight

Web pages have nearly tripled in size since 2015, but 'page size' means something different to browsers, users, and Googlebot. This episode of Google Search Central's Search Off the Record breaks down three competing definitions of page weight and clarifies how Googlebot's 15MB per-URL crawl limit applies independently to every resource your HTML references. If you've ever wondered whether heavy pages hurt crawlability, this is where to start.


0

Page Weight, HTML Size, and Googlebot’s 15MB Crawl Limit Explained

Web pages have tripled in size over the last decade — but what “size” actually means depends entirely on who’s measuring and why. In Google Search Central’s Search Off the Record Episode 107, Googlers Gary Illyes and Martin Splitt unpack the competing definitions of page weight, trace a decade of growth using Web Almanac data, and clarify exactly how Googlebot’s 15MB per-URL crawl limit works in practice.

Google Search Central's 'Search Off the Record' Ep. 107 explores whether the web has grown too heavy to crawl efficiently.
Google Search Central’s ‘Search Off the Record’ Ep. 107 explores whether the web has grown too heavy to crawl efficiently.
  1. Establish what “page weight” means — and why the answer isn’t obvious. The term describes at least three different things: the raw bytes of the HTML document at a given URL, the total transferred resources a browser must fetch to render the page (HTML, CSS, JavaScript, images, fonts), and the on-disk decompressed footprint after network compression is removed. The Web Almanac defines page weight as total data a user must download to view a page. Googlebot’s crawl documentation counts raw bytes per URL. Neither is wrong; they answer different questions for different audiences.
Three competing definitions of page weight depend on whether you're measuring raw HTML bytes, total transferred resources, or on-disk decompressed size.
Three competing definitions of page weight depend on whether you’re measuring raw HTML bytes, total transferred resources, or on-disk decompressed size.
  1. Review a decade of growth from the Web Almanac 2025. The HTTP Archive found that the median mobile homepage weighed 845KB in 2015 — just over half a 1.44MB floppy disk. By July 2025, that median had climbed to 2.3MB, nearly three times larger. The measurement covers all resources transferred to render a homepage, not the HTML file alone.
Web Almanac 2025 data showing median mobile homepage weight grew from 845KB in 2015 to 2.3MB in 2025.
Web Almanac 2025 data showing median mobile homepage weight grew from 845KB in 2015 to 2.3MB in 2025.
  1. Understand Googlebot’s 15MB per-URL raw-byte crawl limit. Googlebot stops reading any single URL after fetching 15MB of raw bytes. That ceiling applies independently to every resource your HTML references — each stylesheet, script, and media file gets its own 15MB budget. From a crawlability standpoint, this matters more than aggregate page weight, because it defines the maximum indexable content per resource, not per user-facing page load.

  2. See the limit in context: loading the WHATWG HTML Living Standard. The WHATWG publishes the HTML specification as a single-page HTML file. Loading it live during the episode took roughly 45 seconds on a fast connection. The downloaded file sits at approximately 14–15MB on disk — approaching Googlebot’s per-URL ceiling for a single document containing almost no images, just densely structured text.

The WHATWG HTML Living Standard single-page version takes ~45 seconds to load and weighs roughly 14–15MB on disk.
The WHATWG HTML Living Standard single-page version takes ~45 seconds to load and weighs roughly 14–15MB on disk.
  1. Compare Chrome’s print-to-PDF output against the official WHATWG PDF. Chrome’s print-to-PDF on the same spec page produces a ~96MB file. The official WHATWG-published PDF of the same content runs approximately 15MB. The difference comes down to compression applied during the publishing pipeline — a concrete illustration of how output format and tooling choices, not just content volume, determine final file weight.

  2. Separate network transfer size from on-disk footprint. gzip and Brotli compression cut the bytes sent over the wire, but the decompressed file written to disk is larger. A page that transfers 5–6MB over the network may occupy 10MB once decompressed. Transfer size affects load time and data costs; on-disk size affects storage-constrained devices. Both metrics are real — they just describe different constraints for different users.

  3. Calibrate impact by connection type. A 2.3MB median homepage is negligible on fiber or 5G. On metered satellite connections it carries a direct dollar cost per page load. On throttled 2G or 3G connections common in parts of the developing world, that same page can mean a multi-minute wait. Page weight has no universal threshold — its real-world impact scales with the network conditions of whoever is loading it.

How does this compare to the official docs?

The episode raises questions the conversation doesn’t fully resolve — particularly around how Googlebot’s 15MB limit interacts with compression, and what the documentation actually specifies versus what practitioners assume.


Here’s What the Official Docs Show

Act 1 walked through a rich conceptual discussion from Google Search Central’s podcast — the definitions hold up well as framing. What the documentation review adds here is an important layer of transparency: several specific figures cited in the episode couldn’t be confirmed against the captured source pages, so you’ll know exactly which claims to independently verify before building strategy around them.


Step 1 — What “page weight” actually means

No official documentation was found for this step — proceed using the video’s approach and verify independently.

Google Search Central homepage — the Googlebot crawl-limit documentation resides at a sub-page not captured in this screenshot
📄 Google Search Central homepage — the Googlebot crawl-limit documentation resides at a sub-page not captured in this screenshot

Step 2 — A decade of growth from the Web Almanac 2025

The 2025 Web Almanac is real, active, and authoritative. Screenshots confirm the report is published by HTTP Archive, draws on a July 2025 dataset of 16.2 million websites, and processed 244 TB of data across 16 chapters — all of which supports the video’s use of it as a source.

2025 Web Almanac methodology — 16.2M websites tested from the July 2025 HTTP Archive dataset, with 244 TB processed
📄 2025 Web Almanac methodology — 16.2M websites tested from the July 2025 HTTP Archive dataset, with 244 TB processed

However, the specific figures cited in the episode — 845KB median mobile homepage in 2015, growing to 2.3MB by 2025 — do not appear in the captured screenshots. Those numbers would live in the dedicated Page Weight chapter, not on the homepage. The homepage foregrounds the CMS chapter.

2025 Web Almanac featured chapter is CMS — the Page Weight chapter with mobile homepage size data is a separate chapter not featured on the homepage
📄 2025 Web Almanac featured chapter is CMS — the Page Weight chapter with mobile homepage size data is a separate chapter not featured on the homepage

No official documentation was found confirming the specific page-weight figures for this step — navigate directly to almanac.httparchive.org/en/2025/page-weight to verify the cited numbers independently.


Step 3 — Googlebot’s 15MB per-URL raw-byte crawl limit

The screenshots captured for this step show the Google Search Central portal homepage, not the dedicated Googlebot documentation page at developers.google.com/search/docs/crawling-indexing/googlebot. As of 2026-03-30, the 15MB per-URL raw-byte crawl limit stated in the episode is not visible in any captured screenshot and cannot be confirmed from the provided images — the specific claim requires verification at the Googlebot sub-page directly.

Google Search Central SEO recommendations — page speed and user experience appear as ranking factors, but no crawl-limit figures are shown
📄 Google Search Central SEO recommendations — page speed and user experience appear as ranking factors, but no crawl-limit figures are shown

No official documentation was found confirming the 15MB crawl limit for this step — verify at developers.google.com/search/docs/crawling-indexing/googlebot before citing this figure in client work.


Step 4 — Loading the WHATWG HTML Living Standard as a real-world example

The WHATWG HTML Living Standard exists and is accessible — the captured screenshots confirm the organization’s homepage links directly to “Read the HTML Living Standard,” and that WHATWG has been stewarding HTML since 2004 under governance from Apple, Google, Mozilla, and Microsoft.

WHATWG community homepage — confirms the HTML Living Standard exists and is linked from here, but document size and load characteristics are not shown
📄 WHATWG community homepage — confirms the HTML Living Standard exists and is linked from here, but document size and load characteristics are not shown

That said, the screenshots capture the whatwg.org organizational homepage, not the html.spec.whatwg.org/multipage/ specification document itself. The ~45-second load time and ~14–15MB on-disk size cited in the episode cannot be confirmed from the visible content.

No official documentation was found for the file-size and load-time claims in this step — proceed using the video’s approach and verify independently by loading the single-page spec directly.


Step 5 — Chrome’s print-to-PDF output vs. the official WHATWG PDF

Google Chrome is a valid and available tool for this demonstration — confirmed. Beyond that, the captured screenshots are Chrome’s marketing homepage, which contains no documentation of the print-to-PDF feature, output file sizes, or compression behavior. The ~96MB Chrome PDF output figure cited in the episode cannot be verified from the provided images.

Google Chrome download page — confirms Chrome's availability but contains no print-to-PDF documentation or file-size data
📄 Google Chrome download page — confirms Chrome’s availability but contains no print-to-PDF documentation or file-size data

No official documentation was found for this step — proceed using the video’s approach and verify independently.


Step 6 — Network transfer size vs. on-disk footprint (gzip/Brotli)

No official documentation was found for this step — proceed using the video’s approach and verify independently.


Step 7 — Calibrating impact by connection type

Google Search Central does surface page speed and user experience as ranking signals — the captured homepage references a “Provide a good user experience” recommendation that links to speed and mobile-friendliness tooling, which sits adjacent to the connection-type framing in this step.

Google Search Central SEO recommendations — page speed and user experience appear as ranking factors, contextually supporting the connection-type impact discussion
📄 Google Search Central SEO recommendations — page speed and user experience appear as ranking factors, contextually supporting the connection-type impact discussion

No official documentation was found directly addressing connection-type impact thresholds for this step — proceed using the video’s approach and verify independently.


  1. Google Search Central — Web SEO Resources — Portal homepage for Google’s official SEO and crawling documentation, including the Googlebot sub-pages where the 15MB crawl limit is documented.
  2. The 2025 Web Almanac — HTTP Archive’s annual state-of-the-web report based on 16.2M websites from the July 2025 dataset; the Page Weight chapter contains the specific median mobile figures referenced in the episode.
  3. Web Hypertext Application Technology Working Group (WHATWG) — Organizational homepage for the WHATWG, which publishes and maintains the HTML Living Standard under four-company governance; links directly to the single-page spec used in the episode’s load-time demonstration.
  4. Google Chrome — Official Chrome download page, confirming the browser used in the print-to-PDF demonstration is available for macOS 12 or later.

Like it? Share with your friends!

0

What's Your Reaction?

hate hate
0
hate
confused confused
0
confused
fail fail
0
fail
fun fun
0
fun
geeky geeky
0
geeky
love love
0
love
lol lol
0
lol
omg omg
0
omg
win win
0
win

0 Comments

Your email address will not be published. Required fields are marked *