Headless Browser | ScrapeRouter

A headless browser is a real browser running without a visible UI, usually controlled by code. In scraping, you use it when a site needs JavaScript execution, real rendering, or browser-like behavior that plain HTTP requests won’t handle reliably.

Examples

A few common cases where people reach for a headless browser:

The page renders data only after JavaScript runs
Content appears after clicking, scrolling, or waiting for XHR calls
You need cookies, local storage, or browser execution context to look like a normal user session

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example.com/products", wait_until="networkidle")
    print(page.title())
    print(page.locator(".product-card").count())
    browser.close()

# Typical pattern: run browser automation in headless mode
node scrape.js

Practical tips

Don’t default to headless browsers for everything: they are slower, heavier, and more expensive than plain HTTP scraping
Use them when the site actually needs rendering: client-side apps, interaction-heavy flows, bot checks tied to browser behavior
Expect more operational mess in production: memory usage, browser crashes, timeouts, fingerprinting issues, proxy coordination
Wait for the right thing, not just a fixed sleep: network idle, a selector, a specific API response
If you only need the underlying API calls, inspect the network first: sometimes the browser is just an expensive way to discover a JSON endpoint
At scale, the hard part is not launching Playwright once: it is keeping browser sessions stable, unblocked, and affordable over time
If you're using ScrapeRouter, this is the kind of thing you route only when needed: simple pages through cheaper request-based scraping, browser-required pages through a headless path

Use cases

Scraping JavaScript-heavy storefronts where product data is injected after page load
Logging into sites that rely on browser state: cookies, tokens, redirects, local storage
Interacting with filters, pagination, infinite scroll, and click-to-reveal content
Capturing rendered HTML or screenshots for monitoring and QA-style checks
Handling anti-bot flows where a plain HTTP client gets blocked but a browser session has a better chance