Examples
A lot of scraping bandwidth gets wasted on things that have nothing to do with the data you want.
- Bad case: loading the full page, images, fonts, analytics scripts, and retrying the same request three times
- Better case: fetch only the HTML or API response you need, cache stable pages, and avoid re-downloading unchanged content
curl "https://www.scraperouter.com/api/v1/scrape/?url=https://example.com/products" \
-H "Authorization: Api-Key $api_key"
If that page is 2 MB because it drags in heavy assets through a browser workflow, and you scrape 100,000 pages, you're moving roughly 200 GB. If the useful data was available from a 100 KB JSON endpoint instead, that's closer to 10 GB. That's the difference between a job that feels fine in a test and one that gets expensive fast in production.
Practical tips
- Block what you don't need: images, fonts, video, ads, analytics, and other third-party junk
- Prefer direct data endpoints: if the site has a clean JSON or XHR endpoint, use that instead of rendering the full page
- Cache aggressively where it makes sense: category pages, reference data, pagination URLs that don't change often
- Watch retry behavior: bad retry logic quietly doubles or triples bandwidth usage
- Use browser automation only when needed: headless browsers are useful, but they burn more bandwidth than plain HTTP fetches
- Measure bytes per successful record: that's a better production metric than just requests per minute
- Be careful with concurrency: more parallelism can mean more wasted traffic if your requests are getting blocked or challenged
import requests
url = "https://www.scraperouter.com/api/v1/scrape/"
params = {"url": "https://example.com/products"}
headers = {"Authorization": "Api-Key $api_key"}
resp = requests.get(url, params=params, headers=headers, timeout=60)
print("status:", resp.status_code)
print("bytes:", len(resp.content))
That last line is boring, but useful. If you're not tracking response size, it's easy to miss where the money and time are going.
Use cases
- Large catalog scraping: bandwidth becomes a real cost when you're crawling hundreds of thousands of product pages every day
- Browser-based scraping: rendering JavaScript-heavy pages can multiply traffic compared to plain HTTP requests
- Proxy-heavy workloads: bandwidth matters more when you're paying for residential or mobile traffic
- Incremental sync jobs: efficient crawlers check what changed instead of re-downloading everything every run
- ScrapeRouter setups: a router layer helps you avoid wasting bandwidth on the wrong method for the job, for example using a lighter fetch path when a full browser session isn't needed