Examples
A few common ways a WAF shows up in scraping:
- You get a 403 even though the page exists
- You get a fake success page, CAPTCHA, or JavaScript challenge instead of the real content
- Requests work locally, then start failing once volume increases
- One proxy pool works for a while, then suddenly gets burned
curl -I https://target-site.com/products/123
HTTP/2 403
server: cloudflare
content-type: text/html
That usually means you are not dealing with a simple rate limit. You are dealing with traffic inspection, fingerprinting, or bot detection behind a WAF.
In code, it often looks like this:
import requests
r = requests.get("https://target-site.com/products/123", timeout=30)
print(r.status_code)
print(r.text[:200])
If that returns a challenge page, CAPTCHA markup, or a block message instead of product data, the WAF is what you need to solve first, not parsing.
Practical tips
- Don’t assume every 403 is just a bad proxy problem: check response headers, body content, and whether you are being served a challenge page
- Watch for soft blocks: 200 status with the wrong HTML, empty data, interstitials, or fake error pages
- Test with realistic browser behavior when plain HTTP clients stop working: headers, cookies, TLS fingerprint, JavaScript execution, session flow
- Expect the problem to change over time: what works for a week can die once the site updates rules
- Separate extraction logic from access logic: if your parser and anti-block logic are tangled together, maintenance gets ugly fast
- If you are doing this at scale, route requests through infrastructure that can switch tactics per target: browser, proxy, session, retry strategy
- With ScrapeRouter, this is basically the point: you send the request once, and the router decides what level of anti-bot handling is needed instead of hardcoding one fragile setup per site
Use cases
- Ecommerce scraping: product pages, pricing, inventory, and search results are often protected by Cloudflare, Akamai, DataDome, or custom WAF rules
- SERP and public data collection: search engines, directories, and aggregator sites are aggressive about bot detection because volume gets abusive fast
- Monitoring jobs: compliance, pricing, and availability monitors often fail in production because they were built for happy-path HTML, not WAF challenges
- Multi-site scraping platforms: once you support dozens or hundreds of domains, WAF handling stops being a one-off workaround and becomes an infrastructure problem