Glossary

Cookies

Cookies are small pieces of data a website stores and sends back on later requests to keep track of sessions, logins, preferences, and basic state. In scraping, they matter because a lot of sites stop working the moment you ignore them, reuse them badly, or lose them between requests.

Examples

A simple session cookie flow looks like this:

  • First request: site sets a cookie
  • Later requests: client sends that cookie back
  • Result: the server treats those requests as part of the same session
curl -i https://example.com/login

You might get a response header like:

Set-Cookie: session_id=abc123; Path=/; HttpOnly; Secure

Then your next request needs to send it back:

curl https://example.com/account \
  -H "Cookie: session_id=abc123"

In Python with requests:

import requests

session = requests.Session()
session.get("https://example.com/login")
response = session.get("https://example.com/account")
print(response.status_code)

With a scraping API, the point is usually not "can I send one cookie". The point is can I keep the whole session stable long enough for the scrape to finish.

Practical tips

  • Use a session jar, not hand-built cookie strings: manually pasting Cookie: headers works for quick tests, then turns into a mess in production.
  • Keep cookies tied to the same identity: if you rotate IPs, headers, or browser fingerprints while reusing the same cookies, some sites will flag it immediately.
  • Expect expiry: login cookies die, CSRF cookies rotate, bot-defense cookies get reissued.
  • Don't share cookies across unrelated jobs: one poisoned or expired session can break a whole batch.
  • Be careful with authenticated cookies: they are effectively credentials.
  • Store only what you need: if you're passing cookies through your own systems, treat them like secrets.
  • Watch for cookie + JavaScript coupling: some sites set cookies only after JS runs, so plain HTTP requests won't reproduce the real flow.
  • If a scrape works once and then starts failing, check cookies first: that's one of the most common failure modes.

A basic authenticated request might look like this:

curl https://www.scraperouter.com/api/v1/scrape/ \
  -H "Authorization: Api-Key $api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/account",
    "headers": {
      "Cookie": "session_id=abc123; cf_clearance=token_here"
    }
  }'

That said, if the target site expects a real browser session, just injecting cookies may not be enough. That's usually where simple scripts start wasting engineering time.

Use cases

  • Authenticated scraping: dashboards, account pages, internal tools behind login.
  • Multi-step flows: search, pagination, cart state, location selection.
  • Session continuity: keeping requests tied to one browsing session so the site doesn't reset or challenge you.
  • Preference persistence: locale, currency, market, language, consent banners.
  • Bot-defense tokens: some anti-bot systems use cookies as part of the challenge and verification flow.

The practical reality is pretty simple: lots of scrapers don't fail because parsing is hard. They fail because session state is fragile, and cookies are most of that state.

Related terms

Session HTTP Headers Authentication CSRF Token Proxy Rotation Browser Fingerprinting CAPTCHA