Greylisting IP | ScrapeRouter

Greylisting IP is when a site does not fully ban your IP, but quietly degrades or limits it because it looks suspicious. In scraping, this usually shows up as intermittent 403s, slower responses, CAPTCHA pages, empty results, or requests that work in a browser but fail from your scraper.

Examples

A greylisted IP is annoying because nothing is fully broken. Some requests still work, which makes debugging slower than a clean ban.

Search pages load, detail pages fail: category URLs return 200, product pages start returning 403
Responses get slower over time: first 20 requests are fine, then latency jumps and timeouts start
Soft blocks: the server returns 200, but the page is a CAPTCHA or a stripped-down version with missing data
Geo or reputation downgrade: the same request works from one IP pool and quietly fails from another

curl -I https://targetsite.com/products/123
# HTTP/2 200

curl https://targetsite.com/products/123 | head
# actually returns a challenge page or empty shell HTML

import requests

url = "https://targetsite.com/search?q=laptop"
for i in range(5):
    r = requests.get(url, timeout=30)
    print(i, r.status_code, len(r.text), r.elapsed.total_seconds())

If the status stays at 200 but body length drops hard, or latency keeps climbing, that is often greylisting rather than a normal site issue.

Practical tips

Watch for soft-failure signals: body length changes, challenge keywords, redirect loops, unusual latency, empty JSON payloads
Do not treat only 403 and 429 as blocking: greylisting often hides inside 200 responses
Rotate earlier, not later: once an IP starts looking degraded, it usually gets worse
Reduce request burstiness: randomize pacing, limit concurrency per domain, avoid hitting the same path pattern too hard
Vary the full request fingerprint: headers, TLS/client profile, cookies, session behavior; IP rotation alone is often not enough
Separate healthy and degraded sessions: if one session starts getting weird responses, quarantine it instead of reusing it
Measure by pool: compare success rate, latency, and challenge rate across datacenter, residential, and mobile IPs
If you use ScrapeRouter: this is the kind of thing a router layer should handle for you, because the real problem is not just getting an IP, it is knowing when an IP is technically alive but operationally bad

blocked_markers = ["captcha", "access denied", "verify you are human"]

def looks_greylisted(response):
    text = response.text.lower()
    return (
        response.status_code == 200 and any(m in text for m in blocked_markers)
    ) or response.elapsed.total_seconds() > 10

Use cases

High-volume product scraping: a retailer starts slowing one proxy subnet instead of banning it outright, so jobs drag for hours before failing
SERP collection: search result pages return partial or challenge-filled HTML from some IPs, while others still look normal
Account/session workflows: login works, but post-login pages get intermittently blocked because the IP has a poor reputation
API scraping behind web defenses: endpoints keep returning 200 with empty datasets for certain IPs, which looks like a parser bug until you compare across pools