Glossary

network tab

The network tab is the browser developer tools panel that shows every request a page makes: HTML, JSON, XHR, fetch calls, images, headers, cookies, and responses. For scraping, this is often the fastest way to find the real data source instead of guessing from messy front-end HTML or clicking around in Selenium.

Examples

A common workflow is: open the page, open DevTools, reload, then filter for XHR or Fetch requests.

You might see a request like this:

curl 'https://example.com/api/products?page=2' \
  -H 'accept: application/json' \
  -H 'user-agent: Mozilla/5.0' \
  -H 'cookie: session=abc123'

And the response is the thing you actually want:

{
  "products": [
    {"id": 101, "name": "Widget", "price": 19.99},
    {"id": 102, "name": "Cable", "price": 4.99}
  ],
  "next_page": 3
}

At that point, scraping the rendered DOM is often pointless. You can just request the same endpoint directly:

import requests

url = "https://example.com/api/products?page=2"
headers = {
    "accept": "application/json",
    "user-agent": "Mozilla/5.0",
}

r = requests.get(url, headers=headers, timeout=30)
print(r.json())

If the site is protected, the network tab still tells you what the browser is trying to call. That helps you decide whether a direct request is enough or whether you need a browser, session handling, proxies, or a router layer like ScrapeRouter.

Practical tips

  • Filter by Fetch/XHR first: this cuts out fonts, images, and other junk.
  • Reload the page with DevTools open: many useful requests only appear during initial page load.
  • Look at headers, query params, cookies, and payloads: the URL alone is often not enough.
  • Check the response tab before building a scraper: if the data is already in JSON, don't waste time parsing HTML.
  • Watch for pagination and cursors: page number, offset, cursor token, next URL.
  • Compare repeated actions: click "next", apply a filter, open a product, then see what request changed.
  • Be careful with one-off success: a request that works once in your browser may depend on session cookies, CSRF tokens, or fingerprinting.
  • If direct replay keeps failing in production, that's the point where a simple requests script stops being cheap.
  • For protected targets, send the page through ScrapeRouter instead of rebuilding anti-bot handling yourself:
curl 'https://www.scraperouter.com/api/v1/scrape/' \
  -H 'Authorization: Api-Key $api_key' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "https://example.com/products"
  }'

Use cases

  • Finding hidden JSON endpoints behind a JavaScript-heavy page.
  • Figuring out whether a site needs browser automation at all, or just a direct API-style request.
  • Reverse-engineering search, pagination, filters, and lazy loading.
  • Debugging why scraped HTML does not match what you see in the browser: the page may hydrate from a separate request after load.
  • Checking what authentication state matters: cookies, bearer token, CSRF token, request headers.
  • Reducing cost and fragility: if the network tab shows a clean data endpoint, you can often skip full browser rendering.
  • Understanding where simple scraping stops working in production: some requests are easy to copy locally and annoying to keep alive at scale.

Related terms

xhr fetch developer tools api endpoint headless browser session cookies csrf token browser automation