Glossary

Headless Browser

A headless browser is a real browser running without a visible UI, usually controlled by code. In scraping, you use it when a site needs JavaScript execution, real rendering, or browser-like behavior that plain HTTP requests won’t handle reliably.

Examples

A few common cases where people reach for a headless browser:

  • The page renders data only after JavaScript runs
  • Content appears after clicking, scrolling, or waiting for XHR calls
  • You need cookies, local storage, or browser execution context to look like a normal user session
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto("https://example.com/products", wait_until="networkidle")
    print(page.title())
    print(page.locator(".product-card").count())
    browser.close()
# Typical pattern: run browser automation in headless mode
node scrape.js

Practical tips

  • Don’t default to headless browsers for everything: they are slower, heavier, and more expensive than plain HTTP scraping
  • Use them when the site actually needs rendering: client-side apps, interaction-heavy flows, bot checks tied to browser behavior
  • Expect more operational mess in production: memory usage, browser crashes, timeouts, fingerprinting issues, proxy coordination
  • Wait for the right thing, not just a fixed sleep: network idle, a selector, a specific API response
  • If you only need the underlying API calls, inspect the network first: sometimes the browser is just an expensive way to discover a JSON endpoint
  • At scale, the hard part is not launching Playwright once: it is keeping browser sessions stable, unblocked, and affordable over time
  • If you're using ScrapeRouter, this is the kind of thing you route only when needed: simple pages through cheaper request-based scraping, browser-required pages through a headless path

Use cases

  • Scraping JavaScript-heavy storefronts where product data is injected after page load
  • Logging into sites that rely on browser state: cookies, tokens, redirects, local storage
  • Interacting with filters, pagination, infinite scroll, and click-to-reveal content
  • Capturing rendered HTML or screenshots for monitoring and QA-style checks
  • Handling anti-bot flows where a plain HTTP client gets blocked but a browser session has a better chance

Related terms

Browser Automation JavaScript Rendering Playwright Puppeteer Proxy Rotation Request Blocking Web Scraping API