Examples
Headers are where a lot of scrapers give themselves away. A request with no realistic browser headers, no accepted languages, and no referer pattern often works in testing, then falls apart in production.
A basic request with explicit headers:
import requests
url = "https://example.com/products"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://example.com/",
"Cache-Control": "no-cache"
}
response = requests.get(url, headers=headers, timeout=30)
print(response.status_code)
print(response.text[:300])
Using ScrapeRouter with headers passed through:
curl -X POST "https://www.scraperouter.com/api/v1/scrape/" \
-H "Authorization: Api-Key $api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/products",
"headers": {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Accept-Language": "en-US,en;q=0.9",
"Referer": "https://example.com/"
}
}'
Common request headers scrapers care about: - User-Agent: identifies the client - Accept: tells the server what content types you can handle - Accept-Language: helps make the request look regionally consistent - Referer: shows where the navigation came from - Cookie: carries session state - Authorization: sends API or auth credentials
Practical tips
- Don’t treat headers as cosmetic. On a lot of sites, they are part of basic bot detection.
- Keep headers internally consistent: browser family, language, platform, and fetch pattern should make sense together.
- Don’t copy a giant header blob from DevTools unless you know why each field is there. Some headers are dynamic and stale values can make things worse.
- If a site uses sessions, headers and cookies need to match the same flow. Randomizing one without the other is sloppy and often breaks.
- Start simple, then add what the target actually checks: User-Agent, Accept, Accept-Language, Referer, Cookie.
- Watch response behavior, not just status codes: soft blocks, empty pages, CAPTCHA HTML, and login redirects often mean your header profile is wrong.
- If you’re scraping at scale, version your header sets. A change in one header can quietly tank success rates.
A minimal realistic Python session:
import requests
session = requests.Session()
session.headers.update({
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9"
})
resp = session.get("https://example.com", timeout=30)
print(resp.status_code)
A practical debugging trick:
for k, v in response.request.headers.items():
print(f"{k}: {v}")
That helps when your code says one thing but the HTTP client library sends another.
Use cases
- Avoiding basic blocks: adding realistic request headers so a target doesn’t instantly classify you as a script.
- Maintaining sessions: sending cookies, CSRF-related values, and navigation context across login or cart flows.
- Matching geolocation or language: using headers like
Accept-Languageso the returned content is consistent with the market you want. - Calling authenticated APIs: passing
Authorizationheaders when scraping isn’t really scraping, it’s just making API requests someone forgot to document. - Reducing breakage in production: keeping header profiles stable across retries, proxies, and browser-like requests so success rates don’t swing for dumb reasons.
- Working with ScrapeRouter: passing custom headers when a target needs specific request context, while letting the router deal with the uglier infrastructure problems around delivery and reliability.