Glossary

Base64

Base64 is a way to encode binary data as plain text using a limited set of ASCII characters. You see it all over the web in things like image blobs, tokens, API payloads, and sometimes scraped responses where the useful data is wrapped in an extra decoding step.

Examples

A lot of scraping annoyances are not really anti-bot problems. Sometimes the site just stuffed the payload into Base64, so you have to decode it before the data is usable.

import base64

encoded = "SGVsbG8sIHNjcmFwaW5n"
decoded = base64.b64decode(encoded).decode("utf-8")

print(decoded)
# Hello, scraping

You will also see Base64 inside data URLs:

import base64

value = "data:text/plain;base64,SGVsbG8="
encoded_part = value.split(",", 1)[1]
print(base64.b64decode(encoded_part).decode("utf-8"))
# Hello

Sometimes an API returns JSON where one field is Base64-encoded and the real work starts after decoding:

{
  "status": "ok",
  "payload": "eyJuYW1lIjoiYWNtZSIsInBsYW4iOiJwcm8ifQ=="
}
import base64
import json

payload = "eyJuYW1lIjoiYWNtZSIsInBsYW4iOiJwcm8ifQ=="
decoded = json.loads(base64.b64decode(payload))
print(decoded)
# {'name': 'acme', 'plan': 'pro'}

Practical tips

  • Base64 is encoding, not encryption: if a site sends useful data in Base64, decoding it is straightforward, it is not protected.
  • Check for common signals: long strings ending in =, ==, data:*;base64,, unusually opaque JSON fields.
  • Decode first, then inspect the result: plain text, JSON, HTML, protobuf, compressed bytes.
  • Be careful with bytes vs strings in Python: b64decode() returns bytes, so you may need .decode("utf-8") after it.
  • If decoding fails, the string may be URL-safe Base64, missing padding, or not Base64 at all.
  • In scraping pipelines, log both the encoded field name and a safe preview of the decoded output. This saves time when a site quietly changes formats.
import base64

value = "SGVsbG8"
padding = "=" * (-len(value) % 4)
decoded = base64.b64decode(value + padding).decode("utf-8")
print(decoded)
  • If you are scraping through ScrapeRouter, Base64 is still your problem at the parsing layer. ScrapeRouter gets you the response reliably, but if the site buried the real payload in an encoded field, you still need to decode it in your extractor.

Use cases

  • Decoding API fields where the response contains an encoded JSON blob instead of readable data.
  • Extracting images or files embedded as data: URLs in HTML.
  • Handling tokens, session values, or client-side state blobs exposed in page scripts.
  • Unwrapping encoded payloads before the next step: JSON parsing, HTML parsing, protobuf decoding, decompression.
  • Debugging scraped responses that look meaningless until you decode one suspicious field and realize the site just hid the real data behind one extra layer.

Related terms

JSON HTML API JavaScript Rendering Proxy Rotation Rate Limiting