Glossary

headers

Headers are the key-value fields sent with an HTTP request or response that tell the other side what you want, who you are, and how to handle the connection. In scraping, they matter because bad or missing headers are one of the fastest ways to look fake and get blocked, even if your parser is fine.

Examples

Headers are where a lot of scrapers give themselves away. A request with no realistic browser headers, no accepted languages, and no referer pattern often works in testing, then falls apart in production.

A basic request with explicit headers:

import requests

url = "https://example.com/products"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://example.com/",
    "Cache-Control": "no-cache"
}

response = requests.get(url, headers=headers, timeout=30)
print(response.status_code)
print(response.text[:300])

Using ScrapeRouter with headers passed through:

curl -X POST "https://www.scraperouter.com/api/v1/scrape/" \
  -H "Authorization: Api-Key $api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/products",
    "headers": {
      "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
      "Accept-Language": "en-US,en;q=0.9",
      "Referer": "https://example.com/"
    }
  }'

Common request headers scrapers care about: - User-Agent: identifies the client - Accept: tells the server what content types you can handle - Accept-Language: helps make the request look regionally consistent - Referer: shows where the navigation came from - Cookie: carries session state - Authorization: sends API or auth credentials

Practical tips

  • Don’t treat headers as cosmetic. On a lot of sites, they are part of basic bot detection.
  • Keep headers internally consistent: browser family, language, platform, and fetch pattern should make sense together.
  • Don’t copy a giant header blob from DevTools unless you know why each field is there. Some headers are dynamic and stale values can make things worse.
  • If a site uses sessions, headers and cookies need to match the same flow. Randomizing one without the other is sloppy and often breaks.
  • Start simple, then add what the target actually checks: User-Agent, Accept, Accept-Language, Referer, Cookie.
  • Watch response behavior, not just status codes: soft blocks, empty pages, CAPTCHA HTML, and login redirects often mean your header profile is wrong.
  • If you’re scraping at scale, version your header sets. A change in one header can quietly tank success rates.

A minimal realistic Python session:

import requests

session = requests.Session()
session.headers.update({
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9"
})

resp = session.get("https://example.com", timeout=30)
print(resp.status_code)

A practical debugging trick:

for k, v in response.request.headers.items():
    print(f"{k}: {v}")

That helps when your code says one thing but the HTTP client library sends another.

Use cases

  • Avoiding basic blocks: adding realistic request headers so a target doesn’t instantly classify you as a script.
  • Maintaining sessions: sending cookies, CSRF-related values, and navigation context across login or cart flows.
  • Matching geolocation or language: using headers like Accept-Language so the returned content is consistent with the market you want.
  • Calling authenticated APIs: passing Authorization headers when scraping isn’t really scraping, it’s just making API requests someone forgot to document.
  • Reducing breakage in production: keeping header profiles stable across retries, proxies, and browser-like requests so success rates don’t swing for dumb reasons.
  • Working with ScrapeRouter: passing custom headers when a target needs specific request context, while letting the router deal with the uglier infrastructure problems around delivery and reliability.

Related terms

user-agent cookies http request proxy session rate limiting captcha