WAF | ScrapeRouter

WAF usually means Web Application Firewall. In scraping, it’s the layer sitting in front of a site that inspects requests and blocks traffic that looks automated, suspicious, or just too aggressive. A lot of “the scraper broke” problems are really WAF problems.

Examples

A few common ways a WAF shows up in scraping:

You get a 403 even though the page exists
You get a fake success page, CAPTCHA, or JavaScript challenge instead of the real content
Requests work locally, then start failing once volume increases
One proxy pool works for a while, then suddenly gets burned

curl -I https://target-site.com/products/123

HTTP/2 403
server: cloudflare
content-type: text/html

That usually means you are not dealing with a simple rate limit. You are dealing with traffic inspection, fingerprinting, or bot detection behind a WAF.

In code, it often looks like this:

import requests

r = requests.get("https://target-site.com/products/123", timeout=30)
print(r.status_code)
print(r.text[:200])

If that returns a challenge page, CAPTCHA markup, or a block message instead of product data, the WAF is what you need to solve first, not parsing.

Practical tips

Don’t assume every 403 is just a bad proxy problem: check response headers, body content, and whether you are being served a challenge page
Watch for soft blocks: 200 status with the wrong HTML, empty data, interstitials, or fake error pages
Test with realistic browser behavior when plain HTTP clients stop working: headers, cookies, TLS fingerprint, JavaScript execution, session flow
Expect the problem to change over time: what works for a week can die once the site updates rules
Separate extraction logic from access logic: if your parser and anti-block logic are tangled together, maintenance gets ugly fast
If you are doing this at scale, route requests through infrastructure that can switch tactics per target: browser, proxy, session, retry strategy
With ScrapeRouter, this is basically the point: you send the request once, and the router decides what level of anti-bot handling is needed instead of hardcoding one fragile setup per site

Use cases

Ecommerce scraping: product pages, pricing, inventory, and search results are often protected by Cloudflare, Akamai, DataDome, or custom WAF rules
SERP and public data collection: search engines, directories, and aggregator sites are aggressive about bot detection because volume gets abusive fast
Monitoring jobs: compliance, pricing, and availability monitors often fail in production because they were built for happy-path HTML, not WAF challenges
Multi-site scraping platforms: once you support dozens or hundreds of domains, WAF handling stops being a one-off workaround and becomes an infrastructure problem