Examples
A greylisted IP is annoying because nothing is fully broken. Some requests still work, which makes debugging slower than a clean ban.
- Search pages load, detail pages fail: category URLs return 200, product pages start returning 403
- Responses get slower over time: first 20 requests are fine, then latency jumps and timeouts start
- Soft blocks: the server returns 200, but the page is a CAPTCHA or a stripped-down version with missing data
- Geo or reputation downgrade: the same request works from one IP pool and quietly fails from another
curl -I https://targetsite.com/products/123
# HTTP/2 200
curl https://targetsite.com/products/123 | head
# actually returns a challenge page or empty shell HTML
import requests
url = "https://targetsite.com/search?q=laptop"
for i in range(5):
r = requests.get(url, timeout=30)
print(i, r.status_code, len(r.text), r.elapsed.total_seconds())
If the status stays at 200 but body length drops hard, or latency keeps climbing, that is often greylisting rather than a normal site issue.
Practical tips
- Watch for soft-failure signals: body length changes, challenge keywords, redirect loops, unusual latency, empty JSON payloads
- Do not treat only 403 and 429 as blocking: greylisting often hides inside 200 responses
- Rotate earlier, not later: once an IP starts looking degraded, it usually gets worse
- Reduce request burstiness: randomize pacing, limit concurrency per domain, avoid hitting the same path pattern too hard
- Vary the full request fingerprint: headers, TLS/client profile, cookies, session behavior; IP rotation alone is often not enough
- Separate healthy and degraded sessions: if one session starts getting weird responses, quarantine it instead of reusing it
- Measure by pool: compare success rate, latency, and challenge rate across datacenter, residential, and mobile IPs
- If you use ScrapeRouter: this is the kind of thing a router layer should handle for you, because the real problem is not just getting an IP, it is knowing when an IP is technically alive but operationally bad
blocked_markers = ["captcha", "access denied", "verify you are human"]
def looks_greylisted(response):
text = response.text.lower()
return (
response.status_code == 200 and any(m in text for m in blocked_markers)
) or response.elapsed.total_seconds() > 10
Use cases
- High-volume product scraping: a retailer starts slowing one proxy subnet instead of banning it outright, so jobs drag for hours before failing
- SERP collection: search result pages return partial or challenge-filled HTML from some IPs, while others still look normal
- Account/session workflows: login works, but post-login pages get intermittently blocked because the IP has a poor reputation
- API scraping behind web defenses: endpoints keep returning 200 with empty datasets for certain IPs, which looks like a parser bug until you compare across pools