Glossary

Content-Type

Content-Type is an HTTP header that tells you what kind of data is in the request or response body, like HTML, JSON, XML, or an image. In scraping, it matters because the body might not be what you expected, and treating JSON like HTML or a PDF like text is how parsers break in production.

Examples

A few common response headers:

  • Content-Type: text/html; charset=utf-8
  • Content-Type: application/json
  • Content-Type: application/xml
  • Content-Type: image/webp
  • Content-Type: text/csv

Checking Content-Type before parsing a response in Python:

import requests

response = requests.get("https://example.com/api/products")
content_type = response.headers.get("Content-Type", "")

if "application/json" in content_type:
    data = response.json()
    print(data)
elif "text/html" in content_type:
    html = response.text
    print(html[:200])
else:
    print(f"Unexpected content type: {content_type}")

Sending JSON with the correct request header:

curl -X POST "https://api.example.com/items" \
  -H "Content-Type: application/json" \
  -d '{"name":"widget","price":19.99}'

With ScrapeRouter, you still want to inspect the response type before deciding how to handle it:

import requests

response = requests.post(
    "https://www.scraperouter.com/api/v1/scrape/",
    headers={"Authorization": "Api-Key $api_key"},
    json={"url": "https://example.com/feed.xml"}
)

content_type = response.headers.get("Content-Type", "")
print(content_type)

Practical tips

  • Do not trust the URL alone: a page ending in .json can return HTML, and an endpoint that looked like HTML yesterday can start returning a bot check page today.
  • Check the header before parsing: especially if your scraper handles multiple targets or retries through different proxy/browser paths.
  • Expect parameters: values often include extras like charset=utf-8, so check for application/json rather than exact string equality.
  • Treat mismatches as signals: if you expected JSON and got text/html, that often means rate limiting, a login wall, a WAF page, or a broken upstream.
  • Set it correctly on requests with a body: if you send JSON, use Content-Type: application/json; if you upload form data, use the form content type the server expects.
  • Do not confuse it with Accept: Content-Type describes the body you sent or received, Accept describes what response formats you want.
  • Log it in production: when scrapers fail, status code + content type + first 200 bytes saves a lot of guessing.

Use cases

  • Choosing the right parser: send HTML to BeautifulSoup, JSON to response.json(), XML to an XML parser, CSV to a CSV reader.
  • Detecting bot blocks: an API that should return application/json suddenly returning text/html is a common sign you got a challenge page instead of data.
  • Validating uploads and POST requests: some endpoints fail silently or behave strangely if the request body format and Content-Type header do not match.
  • Routing scraped content downstream: if you're storing pages, screenshots, feeds, and files in one pipeline, Content-Type helps decide where each response should go.

Related terms

HTTP Headers Accept Status Code Response Body JSON HTML XML