Examples
A few common signs you're dealing with a WAF instead of a normal app response:
- You get a 403 on perfectly valid requests
- The same URL works in a browser but fails in your scraper
- Response bodies contain CAPTCHA markup, challenge pages, or JavaScript checks
- Success rate drops hard when concurrency goes up
- Different IPs get different behavior for the same request
curl -I https://target-site.com/products
{
"status": 403,
"server": "cloudflare",
"content_type": "text/html"
}
That does not always mean Cloudflare is the whole problem, but it tells you something is sitting in front of the origin and making blocking decisions.
With a scraping API, the point is not to "beat" a WAF once. The point is to keep requests working as defenses change.
import requests
url = "https://www.scraperouter.com/api/v1/scrape/"
headers = {
"Authorization": "Api-Key $api_key",
"Content-Type": "application/json"
}
payload = {
"url": "https://target-site.com/products",
"render": True
}
r = requests.post(url, headers=headers, json=payload)
print(r.status_code)
print(r.text[:500])
Practical tips
- Treat WAF blocking as a systems problem, not a header-tweaking problem: IP reputation, TLS fingerprint, browser behavior, cookie flow, request pacing, and session consistency all matter
- Watch for soft blocks, not just hard failures: empty pages, fake 200s, login walls, poisoned HTML, and challenge pages returned with success status codes
- Test with realistic traffic patterns: low-volume single requests often work, production concurrency is where things break
- Keep browser and non-browser paths separate: some targets are fine with plain HTTP, others need full browser execution to get through anti-bot checks
- Measure cost against engineering time: building your own WAF handling stack is possible, but it turns into ongoing maintenance fast
- Do not assume one successful request means the problem is solved: what matters is stable success rate over thousands of requests and over time
- Rotate carefully: random IP rotation without session logic often makes detection worse
- If a target is heavily protected, using a router layer can save a lot of wasted effort because it can switch approach per site instead of forcing one method everywhere
Use cases
- Scraping ecommerce sites protected by Cloudflare, Akamai, DataDome, or AWS WAF
- Running price monitoring jobs where requests need to keep working every day, not just in local tests
- Collecting search result pages that trigger CAPTCHA or JavaScript challenges under load
- Mixing lightweight HTTP fetches with browser-based requests depending on how aggressive the target's WAF is
- Reducing time spent debugging blocks that are really infrastructure and fingerprinting issues, not parser bugs