Scrape

Reference for the /api/v1/scrape/ request and response schema.

The scrape endpoint submits scraping jobs to ScrapeRouter. Each request targets a URL using automatic route selection by default, or a specific scraper and optional proxy configuration when provided. On this page we cover the request model and how to create a synchronous scrape request.

Request Schema

Attributes of the scrape request object (ScrapeRequestSchema).

Attribute Type Description Default
url required string (URL) Absolute http/https URL to scrape.
method string HTTP method used for the target request. GET
headers object Headers sent to the target URL.
query array | object Query parameters appended to the target URL.
data any Request body; body type is auto-detected (JSON, form, or raw).
cookies array | object Cookies for the target request.
timeout_ms integer | number Target request timeout in milliseconds.
allow_redirects boolean Whether the target request should follow redirects.
browser_type string Browser engine for browser-capable scrapers when supported.
headless boolean Run browser-capable scrapers in headless mode when supported.
wait_for_selector string CSS selector to wait for when supported.
wait_for_timeout_ms integer | number Additional wait timeout in milliseconds when supported.
wait_for_load_state string Browser load state to wait for when supported.
page_actions array Browser actions such as click, scroll, wait, or evaluate when supported.
screenshot boolean Request screenshot capture when the selected scraper supports it.
scraper string Scraper identifier, or auto for route selection. auto
scraper_options object Advanced scraper-native options; units and names are scraper-specific. {}
proxy any Proxy config object or proxy type string: datacenter, residential, mobile datacenter
scraperouter any Optional client metadata reserved for ScrapeRouter routing and diagnostics.

Response Schema

Attributes of the scrape response object (ScrapeResponseSchema). Optional fields are omitted when no value is available.

Attribute Type Description Default
id uuid Scrape.id
status_code integer Target response status code or scraper status. Inspect this field and errors even when HTTP is 200.
final_url string Final target URL after redirects, when available.
headers object Target response headers. {}
content string Target response body as text or base64.
content_encoding string Encoding of content: text for decoded text, base64 for binary bodies.
cookies array | object Cookies returned by the target response, when available.
errors array Scraper or target-level errors for this attempt.
screenshot_url string First saved screenshot artifact URL, when available.
scraper_data object Scraper-specific response data, when available.
scraperouter any ScrapeRouter routing metadata such as selected scraper, proxy type, and request cost.

Create a scrape request

POST /api/v1/scrape/

Creates a new scraping request and returns the result synchronously. HTTP 200 means the API request completed; check JSON status_code and errors for the target scrape result.

Required attributes

Parameter Description
url The URL to scrape

Optional attributes

Parameter Description
method HTTP method. Default: "GET"
headers Custom request headers
proxy Proxy type or config. Default: "datacenter"
scraper Scraper identifier to force, or "auto". Default: "auto"
timeout_ms Target request timeout in milliseconds
screenshot Requests screenshot capture from supported browser-capable scrapers. When an artifact is produced, the response includes screenshot_url; otherwise the field is omitted.

Result status

ScrapeRouter separates API transport status from the target scrape result. Validation, authentication, credit, concurrency, and platform timeouts use HTTP 4xx/5xx. A completed scrape attempt returns HTTP 200, even when the target result is a block, timeout, or scraper error. Treat JSON status_code in the 200-399 range with no errors as a successful target scrape.

Advanced options

Prefer top-level normalized fields such as timeout_ms, wait_for_timeout_ms, and screenshot. Values inside scraper_options are scraper-native overrides; their names and units are defined by the selected scraper. For Playwright, native scraper_options.timeout is milliseconds.

Request

#!/usr/bin/env bash
curl -X POST https://www.scraperouter.com/api/v1/scrape/ \
  -H "Authorization: Api-Key {your_api_key}" \
  -H "Content-Type: application/json" \
  -d '{
  "url": "https://example.com",
  "scraper": "auto",
  "proxy": "datacenter"
}'
import requests

response = requests.post(
    "https://www.scraperouter.com/api/v1/scrape/",
    headers={"Authorization": "Api-Key {your_api_key}"},
    json={
        "url": "https://example.com",
        "scraper": "auto",
        "proxy": "datacenter",
    },
)
data = response.json()
const response = await fetch("https://www.scraperouter.com/api/v1/scrape/", {
  method: "POST",
  headers: {
    "Authorization": "Api-Key {your_api_key}",
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com",
    scraper: "auto",
    proxy: "datacenter",
  }),
});

const data = await response.json();

Response

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "status_code": 200,
  "final_url": "https://example.com",
  "content": "<!doctype html>...",
  "content_encoding": "text",
  "headers": {
    "content-type": "text/html; charset=UTF-8"
  },
  "scraperouter": {
    "scraper": "apiritif/curl-cffi:0.14",
    "request_cost": null,
    "proxy_type": "datacenter"
  }
}