Examples
A basic throttle in Python can be as simple as spacing requests out so you do not hammer the same host:
import time
import requests
urls = [
"https://example.com/page/1",
"https://example.com/page/2",
"https://example.com/page/3",
]
for url in urls:
response = requests.get(url, timeout=30)
print(url, response.status_code)
time.sleep(2) # 1 request every 2 seconds
In async scrapers, throttling is often about limiting concurrency, not just adding sleep:
import asyncio
import aiohttp
semaphore = asyncio.Semaphore(3)
async def fetch(session, url):
async with semaphore:
async with session.get(url, timeout=30) as response:
print(url, response.status)
await asyncio.sleep(1)
return await response.text()
async def main():
urls = [
"https://example.com/page/1",
"https://example.com/page/2",
"https://example.com/page/3",
"https://example.com/page/4",
]
async with aiohttp.ClientSession() as session:
await asyncio.gather(*(fetch(session, url) for url in urls))
asyncio.run(main())
If you are routing requests through ScrapeRouter, throttling still matters. A router can handle proxy and anti-bot complexity, but you still want sane request pacing when hitting the same target repeatedly:
curl "https://www.scraperouter.com/api/v1/scrape/?url=https://example.com/products" \
-H "Authorization: Api-Key $api_key"
Practical tips
- Throttle by domain, not globally: amazon.com and a tiny Shopify store should not get the same request budget.
- Control both request rate and concurrency: a scraper doing 50 concurrent requests with a 1-second delay is still aggressive.
- Back off when you see warning signs: 429 responses, sudden CAPTCHA pages, connection resets, slower response times.
- Add jitter to delays so traffic does not look machine-perfect:
import random
import time
time.sleep(random.uniform(1.5, 3.5))
- Treat throttling as a cost control tool too: fewer wasted retries, fewer burned proxies, less time debugging avoidable blocks.
- Do not hardcode one number and forget it: safe limits change by site, route, time of day, and whether you are logged in.
- For serious jobs, keep per-target rules: requests per second, max concurrency, retry budget, cooldown after 429.
Use cases
- Large catalog scraping: keep throughput steady across thousands of pages without getting rate-limited halfway through the run.
- Multi-tenant scraping systems: stop one noisy customer job from burning all available proxy capacity on a single domain.
- Fragile targets: throttle aggressively on sites that start blocking after small traffic spikes.
- Cost-sensitive pipelines: reduce retries, bans, and dead requests that chew through proxy spend.
- Scheduled refresh jobs: spread requests over time so daily updates keep working instead of creating one big traffic burst that gets noticed.