Bandwidth | ScrapeRouter

Bandwidth is the amount of data your scraper sends and receives over the network. In scraping, it directly affects cost, speed, and how noisy your crawler looks to the target site, especially when you're pulling full pages, images, scripts, and retries you didn't actually need.

Examples

A lot of scraping bandwidth gets wasted on things that have nothing to do with the data you want.

Bad case: loading the full page, images, fonts, analytics scripts, and retrying the same request three times
Better case: fetch only the HTML or API response you need, cache stable pages, and avoid re-downloading unchanged content

curl "https://www.scraperouter.com/api/v1/scrape/?url=https://example.com/products" \
  -H "Authorization: Api-Key $api_key"

If that page is 2 MB because it drags in heavy assets through a browser workflow, and you scrape 100,000 pages, you're moving roughly 200 GB. If the useful data was available from a 100 KB JSON endpoint instead, that's closer to 10 GB. That's the difference between a job that feels fine in a test and one that gets expensive fast in production.

Practical tips

Block what you don't need: images, fonts, video, ads, analytics, and other third-party junk
Prefer direct data endpoints: if the site has a clean JSON or XHR endpoint, use that instead of rendering the full page
Cache aggressively where it makes sense: category pages, reference data, pagination URLs that don't change often
Watch retry behavior: bad retry logic quietly doubles or triples bandwidth usage
Use browser automation only when needed: headless browsers are useful, but they burn more bandwidth than plain HTTP fetches
Measure bytes per successful record: that's a better production metric than just requests per minute
Be careful with concurrency: more parallelism can mean more wasted traffic if your requests are getting blocked or challenged

import requests

url = "https://www.scraperouter.com/api/v1/scrape/"
params = {"url": "https://example.com/products"}
headers = {"Authorization": "Api-Key $api_key"}

resp = requests.get(url, params=params, headers=headers, timeout=60)
print("status:", resp.status_code)
print("bytes:", len(resp.content))

That last line is boring, but useful. If you're not tracking response size, it's easy to miss where the money and time are going.

Use cases

Large catalog scraping: bandwidth becomes a real cost when you're crawling hundreds of thousands of product pages every day
Browser-based scraping: rendering JavaScript-heavy pages can multiply traffic compared to plain HTTP requests
Proxy-heavy workloads: bandwidth matters more when you're paying for residential or mobile traffic
Incremental sync jobs: efficient crawlers check what changed instead of re-downloading everything every run
ScrapeRouter setups: a router layer helps you avoid wasting bandwidth on the wrong method for the job, for example using a lighter fetch path when a full browser session isn't needed