Glossary

Backoff

Backoff is the practice of waiting before retrying a request after a failure, block, or rate limit. In scraping, it helps you avoid hammering a site when it is already telling you to slow down, which improves stability and lowers the chance of getting banned.

Examples

A basic scraper should not retry failed requests immediately. That works in toy scripts, then falls apart in production when a target starts returning 429s or intermittent 5xx errors.

import time
import random
import requests

url = "https://example.com/products"
base_delay = 2
max_retries = 5

for attempt in range(max_retries):
    response = requests.get(url, timeout=30)

    if response.status_code == 200:
        print("success")
        break

    if response.status_code in (429, 500, 502, 503, 504):
        delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
        print(f"retrying in {delay:.1f}s after {response.status_code}")
        time.sleep(delay)
        continue

    response.raise_for_status()

If the server sends a Retry-After header, use that instead of guessing:

import time
import requests

response = requests.get("https://example.com/search", timeout=30)

if response.status_code == 429:
    retry_after = response.headers.get("Retry-After")
    if retry_after:
        time.sleep(int(retry_after))

With ScrapeRouter, backoff still matters if your own job queue is retrying failed scrape tasks. The router can handle a lot of the anti-bot and proxy churn, but blindly retrying from your side can still waste money and burn through concurrency.

Practical tips

  • Use exponential backoff for transient failures: 429, 500, 502, 503, 504.
  • Add jitter: a small random delay so all your workers do not retry at the same moment.
  • Respect Retry-After when it exists.
  • Cap the maximum delay: for example 30 to 120 seconds, so retries do not disappear into a black hole.
  • Stop retrying permanent failures: 400, 401, 403, malformed requests, dead URLs.
  • Treat backoff as part of rate control, not a substitute for it: if you are constantly backing off, your request volume is still wrong.
  • Log retry reason, delay, status code, and final outcome. If you do not log it, you will not know whether you have a temporary block or a broken scraper.
  • For distributed scrapers, coordinate retries across workers: otherwise each worker backs off correctly and you still overload the target in aggregate.

A simple retry policy looks like this:

{
  "retry_on_status": [429, 500, 502, 503, 504],
  "max_retries": 5,
  "strategy": "exponential",
  "jitter": true,
  "max_delay_seconds": 60
}

Use cases

  • A retailer starts returning 429 Too Many Requests during peak hours: backing off lets your scraper recover instead of getting fully blocked.
  • A search page intermittently throws 503 because the origin is overloaded: retries with spacing often succeed without needing a full job restart.
  • You run hundreds of workers against the same domain: backoff plus jitter prevents retry storms, which are a very real way to DDoS a site by accident.
  • Your queue retries failed scrape jobs automatically: backoff keeps those retries from turning a temporary issue into wasted proxy spend and noisy failures.
  • You scrape through a router layer like ScrapeRouter: the router handles request routing and unblocking, while your application still needs sane retry timing around task execution and downstream failures.

Related terms

Rate Limiting Retry Throttling 429 Too Many Requests Jitter Proxy Rotation