Glossary

Bandwidth

Bandwidth is the amount of data your scraper sends and receives over the network. In scraping, it directly affects cost, speed, and how noisy your crawler looks to the target site, especially when you're pulling full pages, images, scripts, and retries you didn't actually need.

Examples

A lot of scraping bandwidth gets wasted on things that have nothing to do with the data you want.

  • Bad case: loading the full page, images, fonts, analytics scripts, and retrying the same request three times
  • Better case: fetch only the HTML or API response you need, cache stable pages, and avoid re-downloading unchanged content
curl "https://www.scraperouter.com/api/v1/scrape/?url=https://example.com/products" \
  -H "Authorization: Api-Key $api_key"

If that page is 2 MB because it drags in heavy assets through a browser workflow, and you scrape 100,000 pages, you're moving roughly 200 GB. If the useful data was available from a 100 KB JSON endpoint instead, that's closer to 10 GB. That's the difference between a job that feels fine in a test and one that gets expensive fast in production.

Practical tips

  • Block what you don't need: images, fonts, video, ads, analytics, and other third-party junk
  • Prefer direct data endpoints: if the site has a clean JSON or XHR endpoint, use that instead of rendering the full page
  • Cache aggressively where it makes sense: category pages, reference data, pagination URLs that don't change often
  • Watch retry behavior: bad retry logic quietly doubles or triples bandwidth usage
  • Use browser automation only when needed: headless browsers are useful, but they burn more bandwidth than plain HTTP fetches
  • Measure bytes per successful record: that's a better production metric than just requests per minute
  • Be careful with concurrency: more parallelism can mean more wasted traffic if your requests are getting blocked or challenged
import requests

url = "https://www.scraperouter.com/api/v1/scrape/"
params = {"url": "https://example.com/products"}
headers = {"Authorization": "Api-Key $api_key"}

resp = requests.get(url, params=params, headers=headers, timeout=60)
print("status:", resp.status_code)
print("bytes:", len(resp.content))

That last line is boring, but useful. If you're not tracking response size, it's easy to miss where the money and time are going.

Use cases

  • Large catalog scraping: bandwidth becomes a real cost when you're crawling hundreds of thousands of product pages every day
  • Browser-based scraping: rendering JavaScript-heavy pages can multiply traffic compared to plain HTTP requests
  • Proxy-heavy workloads: bandwidth matters more when you're paying for residential or mobile traffic
  • Incremental sync jobs: efficient crawlers check what changed instead of re-downloading everything every run
  • ScrapeRouter setups: a router layer helps you avoid wasting bandwidth on the wrong method for the job, for example using a lighter fetch path when a full browser session isn't needed

Related terms

Proxy Headless Browser Rate Limiting Caching Retry Logic Web Scraping API Anti-Bot