Scrapers
Choose the right scraper, compare JS and non-JS routes, and understand scraper-specific options.
ScrapeRouter supports multiple scraper backends with different capabilities. This guide explains how to choose a scraper, use common request/response fields, and handle scraper-specific options.
For the current list of available scrapers with descriptions and capabilities, see Scrapers.
Selecting a scraper
Set the scraper field on your request to either a scraper identifier or "auto":
-
auto— The system selects a scraper based on the target URL and options (e.g. whether JavaScript is required). If a scraper fails, the system automatically tries the next more advanced scraper. -
Scraper identifier — Use a specific scraper, e.g.
apiritif/requests:2.32orapiritif/playwright:1.58. Valid identifiers are listed on the Scrapers page.
JavaScript vs non-JavaScript scrapers
Some scrapers handle JavaScript (they run a browser or browser-like environment); others do plain HTTP only.
- Non-JavaScript — e.g. requests, curl-cffi. Fast and lightweight; use when the page does not rely on client-side rendering.
-
JavaScript-capable — e.g. Playwright, Scrapling with a browser fetcher. Required for SPAs and pages that need execution. These scrapers can accept browser options such as
page_actions(click, scroll, wait, evaluate, etc.),wait_for_selector, andscreenshotwhen supported.
Request fields
Every scrape request supports these fields:
| Field | Type | Description | Default |
|---|---|---|---|
url
*required
|
string (URL) | Absolute http/https URL to scrape. | — |
method
|
string | HTTP method used for the target request. |
GET
|
headers
|
object | Headers sent to the target URL. | — |
query
|
array | object | Query parameters appended to the target URL. | — |
data
|
any | Request body; body type is auto-detected (JSON, form, or raw). | — |
cookies
|
array | object | Cookies for the target request. | — |
timeout_ms
|
integer | number | Target request timeout in milliseconds. | — |
allow_redirects
|
boolean | Whether the target request should follow redirects. | — |
browser_type
|
string | Browser engine for browser-capable scrapers when supported. | — |
headless
|
boolean | Run browser-capable scrapers in headless mode when supported. | — |
wait_for_selector
|
string | CSS selector to wait for when supported. | — |
wait_for_timeout_ms
|
integer | number | Additional wait timeout in milliseconds when supported. | — |
wait_for_load_state
|
string | Browser load state to wait for when supported. | — |
page_actions
|
array | Browser actions such as click, scroll, wait, or evaluate when supported. | — |
screenshot
|
boolean | Request screenshot capture when the selected scraper supports it. | — |
scraper
|
string | Scraper identifier, or auto for route selection. |
auto
|
scraper_options
|
object | Advanced scraper-native options; units and names are scraper-specific. |
{}
|
proxy
|
any | Proxy config object or proxy type string: datacenter, residential, mobile |
datacenter
|
scraperouter
|
any | Optional client metadata reserved for ScrapeRouter routing and diagnostics. | — |
Response fields
The scrape response includes these fields. A completed API call can still contain a target-level failure; inspect JSON status_code and errors.
| Field | Type | Description | Default |
|---|---|---|---|
id
|
uuid | Scrape.id | — |
status_code
|
integer | Target response status code or scraper status. Inspect this field and errors even when HTTP is 200. | — |
final_url
|
string | Final target URL after redirects, when available. | — |
headers
|
object | Target response headers. |
{}
|
content
|
string | Target response body as text or base64. | — |
content_encoding
|
string | Encoding of content: text for decoded text, base64 for binary bodies. | — |
cookies
|
array | object | Cookies returned by the target response, when available. | — |
errors
|
array | Scraper or target-level errors for this attempt. | — |
screenshot_url
|
string | First saved screenshot artifact URL, when available. | — |
scraper_data
|
object | Scraper-specific response data, when available. | — |
scraperouter
|
any | ScrapeRouter routing metadata such as selected scraper, proxy type, and request cost. | — |
Scraper options scraper_options
Pass advanced options that only apply to a given scraper via the scraper_options object. Prefer normalized top-level fields when they exist. For example, use top-level timeout_ms for a target timeout in milliseconds instead of a native timeout option.
Values in scraper_options are passed as scraper-native overrides, so names and units depend on the selected scraper. For Playwright, native scraper_options.timeout is milliseconds. Unsupported native keys can be ignored or rejected depending on the scraper adapter.
For the exact options supported by each scraper, check the scraper’s documentation or the Scrapers page.
Meta under scraperouter
Extended metadata for requests and responses may be provided under a scraperouter key. Use it for:
- Request — Optional hints or tracking (e.g. idempotency keys, trace IDs) that your client sends; the API may preserve or echo them.
-
Response — Extra data the API adds (e.g. timing, selected scraper details, or scraper-specific results like
javascript_result) when available. The exact keys depend on the scraper and API version.
Rely only on the unified fields above for stable behavior; treat scraperouter as optional and extensible.