Glossary

HTTP

HTTP, short for Hypertext Transfer Protocol, is the basic request-response protocol browsers, APIs, and scrapers use to talk to web servers. In practice, it’s the layer where you send a request like GET or POST and get back a response with status codes, headers, and a body, which is why scraping usually starts here before the real mess begins.

Examples

A simple HTTP request in scraping usually looks like this:

curl -i "https://example.com/products"

That response comes back with the parts you end up debugging in production:

  • Status code: 200, 403, 429
  • Headers: content-type, set-cookie, cache-control
  • Body: HTML, JSON, or sometimes a block page pretending to be HTML

In Python:

import requests

resp = requests.get("https://example.com/products", timeout=30)
print(resp.status_code)
print(resp.headers.get("content-type"))
print(resp.text[:200])

For scraping, the annoying part is that a valid HTTP response does not mean you got the data you wanted. You might get a 200 with an empty page, a captcha, or geo-specific content.

Practical tips

  • Learn the basics of methods, headers, cookies, status codes, and redirects. A lot of scraping problems are just HTTP problems wearing a different hat.
  • Don’t assume 200 = success: check the actual response body, expected selectors, JSON fields, or content length.
  • Watch for 403, 429, and redirect loops: those usually mean blocking, rate limits, or session issues.
  • Reuse sessions when it matters: some sites expect cookie continuity.
  • Set sane timeouts and retries, but don’t blindly retry everything: retrying a bad fingerprint or blocked request just burns money.
  • If the site is simple, plain HTTP requests are enough. If the site depends on JavaScript, browser rendering is usually the next step.
  • In production, log more than the URL: status code, final URL, headers, proxy used, and a small response sample. That’s the difference between fixing issues in 10 minutes and guessing for 3 hours.
  • With ScrapeRouter, the point is not that HTTP goes away. The point is you stop hand-stitching proxy rotation, retries, browser fallback, and anti-bot workarounds around every request.

Use cases

  • Fetching HTML pages: product listings, blogs, docs, category pages
  • Calling JSON endpoints: many modern sites load data through background HTTP requests that are easier to scrape than the rendered page
  • Submitting forms: search pages, login flows, pagination, filters
  • Inspecting site behavior: seeing which headers, cookies, or API calls the frontend uses before you decide whether simple requests are enough
  • Debugging scraper failures: figuring out whether the issue is transport, blocking, missing headers, expired sessions, or the site just changed

Related terms

HTTPS Request Header Response Header Status Code Cookie User-Agent GET Request POST Request API Proxy