Scrapers

Choose the right scraper, compare JS and non-JS routes, and understand scraper-specific options.

ScrapeRouter supports multiple scraper backends with different capabilities. This guide explains how to choose a scraper, use common request/response fields, and handle scraper-specific options.

For the current list of available scrapers with descriptions and capabilities, see Scrapers.

Selecting a scraper

Set the scraper field on your request to either a scraper identifier or "auto":

  • auto — The system selects a scraper based on the target URL and options (e.g. whether JavaScript is required). If a scraper fails, the system automatically tries the next more advanced scraper.
  • Scraper identifier — Use a specific scraper, e.g. apiritif/requests:2.32 or apiritif/playwright:1.58. Valid identifiers are listed on the Scrapers page.

JavaScript vs non-JavaScript scrapers

Some scrapers handle JavaScript (they run a browser or browser-like environment); others do plain HTTP only.

  • Non-JavaScript — e.g. requests, curl-cffi. Fast and lightweight; use when the page does not rely on client-side rendering.
  • JavaScript-capable — e.g. Playwright, Scrapling with a browser fetcher. Required for SPAs and pages that need execution. These scrapers can accept browser options such as page_actions (click, scroll, wait, evaluate, etc.), wait_for_selector, and screenshot when supported.

Request fields

Every scrape request supports these fields:

Field Type Description Default
url *required string (URL) Absolute http/https URL to scrape.
method string HTTP method used for the target request. GET
headers object Headers sent to the target URL.
query array | object Query parameters appended to the target URL.
data any Request body; body type is auto-detected (JSON, form, or raw).
cookies array | object Cookies for the target request.
timeout_ms integer | number Target request timeout in milliseconds.
allow_redirects boolean Whether the target request should follow redirects.
browser_type string Browser engine for browser-capable scrapers when supported.
headless boolean Run browser-capable scrapers in headless mode when supported.
wait_for_selector string CSS selector to wait for when supported.
wait_for_timeout_ms integer | number Additional wait timeout in milliseconds when supported.
wait_for_load_state string Browser load state to wait for when supported.
page_actions array Browser actions such as click, scroll, wait, or evaluate when supported.
screenshot boolean Request screenshot capture when the selected scraper supports it.
scraper string Scraper identifier, or auto for route selection. auto
scraper_options object Advanced scraper-native options; units and names are scraper-specific. {}
proxy any Proxy config object or proxy type string: datacenter, residential, mobile datacenter
scraperouter any Optional client metadata reserved for ScrapeRouter routing and diagnostics.

Response fields

The scrape response includes these fields. A completed API call can still contain a target-level failure; inspect JSON status_code and errors.

Field Type Description Default
id uuid Scrape.id
status_code integer Target response status code or scraper status. Inspect this field and errors even when HTTP is 200.
final_url string Final target URL after redirects, when available.
headers object Target response headers. {}
content string Target response body as text or base64.
content_encoding string Encoding of content: text for decoded text, base64 for binary bodies.
cookies array | object Cookies returned by the target response, when available.
errors array Scraper or target-level errors for this attempt.
screenshot_url string First saved screenshot artifact URL, when available.
scraper_data object Scraper-specific response data, when available.
scraperouter any ScrapeRouter routing metadata such as selected scraper, proxy type, and request cost.

Scraper options scraper_options

Pass advanced options that only apply to a given scraper via the scraper_options object. Prefer normalized top-level fields when they exist. For example, use top-level timeout_ms for a target timeout in milliseconds instead of a native timeout option.

Values in scraper_options are passed as scraper-native overrides, so names and units depend on the selected scraper. For Playwright, native scraper_options.timeout is milliseconds. Unsupported native keys can be ignored or rejected depending on the scraper adapter.

For the exact options supported by each scraper, check the scraper’s documentation or the Scrapers page.

Meta under scraperouter

Extended metadata for requests and responses may be provided under a scraperouter key. Use it for:

  • Request — Optional hints or tracking (e.g. idempotency keys, trace IDs) that your client sends; the API may preserve or echo them.
  • Response — Extra data the API adds (e.g. timing, selected scraper details, or scraper-specific results like javascript_result) when available. The exact keys depend on the scraper and API version.

Rely only on the unified fields above for stable behavior; treat scraperouter as optional and extensible.