Glossary

Whitelisting

Whitelisting means explicitly allowing a specific IP, API key, domain, or account to access something that would otherwise be blocked or rate-limited. In scraping, it usually comes up when a target, proxy provider, or internal system only accepts traffic from approved sources, which is fine until your IPs change and things quietly break.

Examples

A few places whitelisting shows up in real scraping setups:

  • Proxy provider access: your account can use the proxy network only from approved server IPs
  • Target-side allowlist: a site partner allows your scraper through their firewall from known IPs
  • Internal APIs: your scraping pipeline can call storage, queues, or admin endpoints only from approved runners
# Example: calling an API from a server whose IP has been whitelisted
curl https://api.example.com/data \
  -H "Authorization: Bearer $API_TOKEN"
import requests

url = "https://api.example.com/data"
headers = {"Authorization": "Bearer YOUR_TOKEN"}
resp = requests.get(url, headers=headers, timeout=30)
print(resp.status_code)
print(resp.text[:200])

If that server IP is not on the allowlist, the request may fail with a 401, 403, connection reset, or just a silent timeout. That's one of the annoying parts: whitelisting failures often look like random network issues until you check the actual policy.

Practical tips

  • Do not rely on whitelisting alone: pair it with API keys, auth tokens, or signed requests
  • Expect IP changes to break things: this matters if you're running on autoscaling instances, serverless, CI runners, or rotating proxy infrastructure
  • Keep the allowlist small and documented: who added it, why it exists, what depends on it
  • Have a revocation path: stale whitelisted IPs tend to stick around forever unless someone owns cleanup
  • Monitor for policy failures: track 403s, handshake failures, and unexplained connection errors separately from normal scrape failures
  • Be careful with residential or highly dynamic egress: whitelisting works best when the source IP is stable
  • With ScrapeRouter: this is part of the point of putting a router layer in front of scraping traffic. You don't want every target integration coupled directly to whichever provider or IP range changed this week.
# Useful check when debugging from a scrape worker
curl https://api.ipify.org
import requests

print(requests.get("https://api.ipify.org", timeout=10).text)

That simple check saves time. A lot of "the scraper is broken" incidents are really just "the traffic is coming from a new IP nobody whitelisted."

Use cases

  • B2B data access: a partner gives you access to inventory, pricing, or catalog endpoints only from approved company IPs
  • Proxy account protection: a proxy vendor lets your credentials work only when requests come from your whitelisted servers
  • Internal platform security: only approved workers can send jobs to your crawl scheduler or fetch from internal APIs
  • Controlled scraping environments: legal, compliance, or client requirements force traffic through known egress IPs instead of open rotation

Whitelisting is useful when you need predictable access control. It becomes a maintenance problem when the rest of your system is dynamic and the allowlist is static.

Related terms

Proxy Authentication Rate Limiting IP Rotation Residential Proxies Datacenter Proxies 403 Forbidden User-Agent Rotation