WAF | ScrapeRouter

A WAF, or Web Application Firewall, sits in front of a site and filters requests before they reach the application. In scraping, it is one of the main things blocking you in production: rate limits, CAPTCHAs, bot checks, fingerprinting, and silent 403s often come from the WAF, not the site itself.

Examples

A few common signs you're dealing with a WAF instead of a normal app response:

You get a 403 on perfectly valid requests
The same URL works in a browser but fails in your scraper
Response bodies contain CAPTCHA markup, challenge pages, or JavaScript checks
Success rate drops hard when concurrency goes up
Different IPs get different behavior for the same request

curl -I https://target-site.com/products

{
  "status": 403,
  "server": "cloudflare",
  "content_type": "text/html"
}

That does not always mean Cloudflare is the whole problem, but it tells you something is sitting in front of the origin and making blocking decisions.

With a scraping API, the point is not to "beat" a WAF once. The point is to keep requests working as defenses change.

import requests

url = "https://www.scraperouter.com/api/v1/scrape/"
headers = {
    "Authorization": "Api-Key $api_key",
    "Content-Type": "application/json"
}
payload = {
    "url": "https://target-site.com/products",
    "render": True
}

r = requests.post(url, headers=headers, json=payload)
print(r.status_code)
print(r.text[:500])

Practical tips

Treat WAF blocking as a systems problem, not a header-tweaking problem: IP reputation, TLS fingerprint, browser behavior, cookie flow, request pacing, and session consistency all matter
Watch for soft blocks, not just hard failures: empty pages, fake 200s, login walls, poisoned HTML, and challenge pages returned with success status codes
Test with realistic traffic patterns: low-volume single requests often work, production concurrency is where things break
Keep browser and non-browser paths separate: some targets are fine with plain HTTP, others need full browser execution to get through anti-bot checks
Measure cost against engineering time: building your own WAF handling stack is possible, but it turns into ongoing maintenance fast
Do not assume one successful request means the problem is solved: what matters is stable success rate over thousands of requests and over time
Rotate carefully: random IP rotation without session logic often makes detection worse
If a target is heavily protected, using a router layer can save a lot of wasted effort because it can switch approach per site instead of forcing one method everywhere

Use cases

Scraping ecommerce sites protected by Cloudflare, Akamai, DataDome, or AWS WAF
Running price monitoring jobs where requests need to keep working every day, not just in local tests
Collecting search result pages that trigger CAPTCHA or JavaScript challenges under load
Mixing lightweight HTTP fetches with browser-based requests depending on how aggressive the target's WAF is
Reducing time spent debugging blocks that are really infrastructure and fingerprinting issues, not parser bugs