XHR | ScrapeRouter

XHR stands for XMLHttpRequest, the browser API used by JavaScript to make HTTP requests in the background without reloading the page. In scraping, people often say “watch the XHRs” because those requests often return the actual structured data, which is a lot cleaner than fighting brittle HTML.

Examples

A common scraping workflow is: load the page once, open DevTools, look at the Network tab, filter by XHR, and find the request that returns JSON instead of rendered markup.

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()

    def log_response(response):
        if "api" in response.url or response.request.resource_type == "xhr":
            print(response.status, response.url)

    page.on("response", log_response)
    page.goto("https://example.com")
    page.wait_for_timeout(5000)
    browser.close()

If you can reproduce the same request directly, you often do not need to parse the page at all.

curl 'https://example.com/api/products?page=1' \
  -H 'accept: application/json' \
  -H 'x-requested-with: XMLHttpRequest'

And if you need the browser to do the hard part first, you can still use ScrapeRouter to render the page and inspect the traffic your scraper cares about.

curl 'https://www.scraperouter.com/api/v1/scrape/' \
  -H 'Authorization: Api-Key $api_key' \
  -H 'Content-Type: application/json' \
  --data-raw '{
    "url": "https://example.com/products",
    "browser": true
  }'

Practical tips

XHR is a signal, not magic: many sites now use Fetch instead of XMLHttpRequest, and DevTools often groups them together as "Fetch/XHR".
Look for JSON first: if an XHR response gives you clean JSON, use that instead of parsing deeply nested HTML.
Copy the full request: method, headers, query params, cookies, and body all matter. Missing one weird header is a classic reason a replayed request fails.
Check pagination and cursors: the first XHR is rarely the whole dataset. Look for page, offset, limit, cursor, or next tokens.
Watch for auth state: some XHR endpoints only work after the page sets cookies, local storage, CSRF tokens, or signed request params.
Do not assume stability: XHR endpoints are often more stable than HTML selectors, but not stable enough to trust blindly in production.
Log responses in your scraper: when something breaks, network-level logs save a lot of time.

if response.headers.get("content-type", "").startswith("application/json"):
    data = response.json()
    print(data)

Use cases

Scraping product listings from ecommerce pages where the HTML is mostly shell and the real catalog data comes from an XHR JSON endpoint.
Extracting search results, filters, prices, and inventory without reverse-engineering frontend markup.
Capturing background API calls from single-page apps built with React, Vue, or similar frameworks.
Replaying internal requests at scale after confirming the endpoint does not require a full browser for every page.
Debugging scraping failures by comparing what the browser fetched versus what your direct HTTP client sent.