Examples
A lot of scraping annoyances are not really anti-bot problems. Sometimes the site just stuffed the payload into Base64, so you have to decode it before the data is usable.
import base64
encoded = "SGVsbG8sIHNjcmFwaW5n"
decoded = base64.b64decode(encoded).decode("utf-8")
print(decoded)
# Hello, scraping
You will also see Base64 inside data URLs:
import base64
value = "data:text/plain;base64,SGVsbG8="
encoded_part = value.split(",", 1)[1]
print(base64.b64decode(encoded_part).decode("utf-8"))
# Hello
Sometimes an API returns JSON where one field is Base64-encoded and the real work starts after decoding:
{
"status": "ok",
"payload": "eyJuYW1lIjoiYWNtZSIsInBsYW4iOiJwcm8ifQ=="
}
import base64
import json
payload = "eyJuYW1lIjoiYWNtZSIsInBsYW4iOiJwcm8ifQ=="
decoded = json.loads(base64.b64decode(payload))
print(decoded)
# {'name': 'acme', 'plan': 'pro'}
Practical tips
- Base64 is encoding, not encryption: if a site sends useful data in Base64, decoding it is straightforward, it is not protected.
- Check for common signals: long strings ending in
=,==,data:*;base64,, unusually opaque JSON fields. - Decode first, then inspect the result: plain text, JSON, HTML, protobuf, compressed bytes.
- Be careful with bytes vs strings in Python:
b64decode()returns bytes, so you may need.decode("utf-8")after it. - If decoding fails, the string may be URL-safe Base64, missing padding, or not Base64 at all.
- In scraping pipelines, log both the encoded field name and a safe preview of the decoded output. This saves time when a site quietly changes formats.
import base64
value = "SGVsbG8"
padding = "=" * (-len(value) % 4)
decoded = base64.b64decode(value + padding).decode("utf-8")
print(decoded)
- If you are scraping through ScrapeRouter, Base64 is still your problem at the parsing layer. ScrapeRouter gets you the response reliably, but if the site buried the real payload in an encoded field, you still need to decode it in your extractor.
Use cases
- Decoding API fields where the response contains an encoded JSON blob instead of readable data.
- Extracting images or files embedded as
data:URLs in HTML. - Handling tokens, session values, or client-side state blobs exposed in page scripts.
- Unwrapping encoded payloads before the next step: JSON parsing, HTML parsing, protobuf decoding, decompression.
- Debugging scraped responses that look meaningless until you decode one suspicious field and realize the site just hid the real data behind one extra layer.