What is crawl budget? {#what-is-crawl-budget}
Crawl budget is the number of pages Googlebot will crawl on your site within a given timeframe. For most small sites this doesn't matter — Google will crawl everything. But for large eCommerce sites with thousands of URLs, it becomes a critical factor in which pages actually get indexed.
Google has finite resources and won't crawl every URL on a massive site in every crawl cycle. The pages it does crawl are determined by two things: crawl rate limit (how fast Googlebot can crawl without overloading your server) and crawl demand (how often Google wants to revisit your URLs based on their perceived importance and freshness).
How faceted nav creates the problem {#how-faceted-nav-creates-the-problem}
Faceted navigation — the filters on eCommerce category pages (colour, size, price range, brand) — can generate thousands or even millions of unique URLs. Most of these pages are near-duplicates with no independent SEO value. If Googlebot is spending time crawling /shoes?colour=red&size=8&brand=nike, it's not crawling your important category and product pages.
Here's why this compounds quickly:
- Even modest filter combinations multiply exponentially — 10 colours × 8 sizes × 15 brands = 1,200 URL variants per category
- A site with 50 categories suddenly has 60,000+ low-value crawlable URLs
- Googlebot signals show these URLs matter less → crawl budget shifts away from real pages
- Duplicate content signals can dilute ranking signals across your real category pages
How to fix it {#how-to-fix-it}
There's no single solution — the right approach depends on your platform, URL structure, and how your filters work. The main levers are:
Canonical tags — point facet URLs back to the clean category URL. Good for cases where the filtered page has some SEO value (e.g. filtering by a popular brand).
Robots.txt disallow — block parameterised URLs from being crawled at all. Only use this if the pages have zero SEO value and you don't need them indexed.
Noindex, follow — let Googlebot crawl the page (to follow internal links) but tell it not to index it. Useful middle ground.
URL rewriting — if your platform generates parameters, consider rewriting to clean URLs where there's genuine search demand (e.g. /red-running-shoes/ rather than /shoes?colour=red&type=running).
Tools to audit crawl budget {#tools-to-audit-crawl-budget}
- Screaming Frog — crawl your site and filter by parameter URLs to see the scale of the problem
- Google Search Console → Coverage report → see which URLs are indexed vs crawled but not indexed
- GSC URL Inspection tool — check individual pages to see last crawl date and indexation status
- Server logs — the most accurate picture of what Googlebot is actually crawling and how frequently