Glossary

ETag

An ETag is an HTTP response header that identifies a specific version of a resource. Browsers, CDNs, and bots use it for conditional requests, so the server can return 304 Not Modified instead of sending the full response again when nothing changed.

Examples

A server might return an ETag like this:

curl -I https://example.com/products
HTTP/1.1 200 OK
ETag: "686897696a7c876b7e"
Cache-Control: max-age=0

On the next request, the client can send that value back:

curl -H 'If-None-Match: "686897696a7c876b7e"' -I https://example.com/products

If the page did not change, the server can reply:

HTTP/1.1 304 Not Modified
ETag: "686897696a7c876b7e"

In scraping, this matters when you actually want the body every time. Reusing browser-like headers from a previous session can trigger 304s and leave you wondering why the response is empty.

Practical tips

  • If you want fresh content, check whether your client is sending If-None-Match from a previous response.
  • If you're debugging missing bodies, inspect both request and response headers first: ETag, If-None-Match, Cache-Control, Last-Modified.
  • Don’t treat 304 as a scraping failure by default: it means the server believes your cached copy is still valid.
  • For scrapers, be deliberate about caching: browser flow, keep validators like ETag; raw data collection, you may want to drop conditional headers.
  • Be careful rotating between clients or proxy layers with shared headers: stale validators can create confusing behavior.

Example in Python:

import requests

url = "https://example.com/page"

# First request
r1 = requests.get(url)
etag = r1.headers.get("ETag")
print(r1.status_code, etag)

# Conditional request
headers = {"If-None-Match": etag} if etag else {}
r2 = requests.get(url, headers=headers)
print(r2.status_code)

If you're using ScrapeRouter and need the actual page content, keep an eye on forwarded request headers from your own client. A lot of weird cache behavior is self-inflicted.

Use cases

  • Browser caching: save bandwidth and speed up repeat page loads.
  • API polling: check whether a resource changed without downloading the full payload again.
  • Scraping at scale: reduce transfer costs when monitoring pages for changes.
  • Change detection: compare ETag values over time as a cheap signal that content or generated output changed.
  • Concurrency control: some APIs use ETags with conditional writes to avoid overwriting newer updates.

Related terms

Cache-Control Last-Modified 304 Not Modified HTTP Headers Conditional Requests