Scrapers

Choose the right scraper, compare JS and non-JS routes, and understand scraper-specific options.

ScrapeRouter supports multiple scraper backends with different capabilities. This guide explains how to choose a scraper, use common request/response fields, and handle scraper-specific options.

For the current list of available scrapers with descriptions and capabilities, see Scrapers.

Selecting a scraper

Set the scraper field on your request to either a scraper identifier or "auto":

auto — The system selects a scraper based on the target URL and options (e.g. whether JavaScript is required). If a scraper fails, the system automatically tries the next more advanced scraper.
Scraper identifier — Use a specific scraper, e.g. apiritif/requests:2.32 or apiritif/playwright:1.58. Valid identifiers are listed on the Scrapers page.

JavaScript vs non-JavaScript scrapers

Some scrapers handle JavaScript (they run a browser or browser-like environment); others do plain HTTP only.

Non-JavaScript — e.g. requests, curl-cffi. Fast and lightweight; use when the page does not rely on client-side rendering.
JavaScript-capable — e.g. Playwright, Scrapling with a browser fetcher. Required for SPAs and pages that need execution. These scrapers can accept browser options such as page_actions (click, scroll, wait, evaluate, etc.), wait_for_selector, and screenshot when supported.

Request fields

Every scrape request supports these fields:

Field	Type	Description	Default
`url` *required	string (URL)	Absolute http/https URL to scrape.	—
`method`	string	HTTP method used for the target request.	`GET`
`headers`	object	Headers sent to the target URL.	—
`query`	array \| object	Query parameters appended to the target URL.	—
`data`	any	Request body; body type is auto-detected (JSON, form, or raw).	—
`cookies`	array \| object	Cookies for the target request.	—
`timeout_ms`	integer \| number	Target request timeout in milliseconds.	—
`allow_redirects`	boolean	Whether the target request should follow redirects.	—
`browser_type`	string	Browser engine for browser-capable scrapers when supported.	—
`headless`	boolean	Run browser-capable scrapers in headless mode when supported.	—
`wait_for_selector`	string	CSS selector to wait for when supported.	—
`wait_for_timeout_ms`	integer \| number	Additional wait timeout in milliseconds when supported.	—
`wait_for_load_state`	string	Browser load state to wait for when supported.	—
`page_actions`	array	Browser actions such as click, scroll, wait, or evaluate when supported.	—
`screenshot`	boolean	Request screenshot capture when the selected scraper supports it.	—
`scraper`	string	Scraper identifier, or auto for route selection.	`auto`
`scraper_options`	object	Advanced scraper-native options; units and names are scraper-specific.	`{}`
`proxy`	any	Proxy config object or proxy type string: datacenter, residential, mobile	`datacenter`
`scraperouter`	any	Optional client metadata reserved for ScrapeRouter routing and diagnostics.	—

Response fields

The scrape response includes these fields. A completed API call can still contain a target-level failure; inspect JSON status_code and errors.

Field	Type	Description	Default
`id`	uuid	Scrape.id	—
`status_code`	integer	Target response status code or scraper status. Inspect this field and errors even when HTTP is 200.	—
`final_url`	string	Final target URL after redirects, when available.	—
`headers`	object	Target response headers.	`{}`
`content`	string	Target response body as text or base64.	—
`content_encoding`	string	Encoding of content: text for decoded text, base64 for binary bodies.	—
`cookies`	array \| object	Cookies returned by the target response, when available.	—
`errors`	array	Scraper or target-level errors for this attempt.	—
`screenshot_url`	string	First saved screenshot artifact URL, when available.	—
`scraper_data`	object	Scraper-specific response data, when available.	—
`scraperouter`	any	ScrapeRouter routing metadata such as selected scraper, proxy type, and request cost.	—

Scraper options `scraper_options`

Pass advanced options that only apply to a given scraper via the scraper_options object. Prefer normalized top-level fields when they exist. For example, use top-level timeout_ms for a target timeout in milliseconds instead of a native timeout option.

Values in scraper_options are passed as scraper-native overrides, so names and units depend on the selected scraper. For Playwright, native scraper_options.timeout is milliseconds. Unsupported native keys can be ignored or rejected depending on the scraper adapter.

For the exact options supported by each scraper, check the scraper’s documentation or the Scrapers page.

Meta under `scraperouter`

Extended metadata for requests and responses may be provided under a scraperouter key. Use it for:

Request — Optional hints or tracking (e.g. idempotency keys, trace IDs) that your client sends; the API may preserve or echo them.
Response — Extra data the API adds (e.g. timing, selected scraper details, or scraper-specific results like javascript_result) when available. The exact keys depend on the scraper and API version.

Rely only on the unified fields above for stable behavior; treat scraperouter as optional and extensible.