Examples
A normal scraper request usually starts lower in the stack than people think:
- Your client opens a TCP connection to the target server
- If it's HTTPS, TLS is negotiated on top of that TCP connection
- Then the HTTP request is sent
If the TCP connection can't be established, the request never gets far enough to return a 403, 404, or 200.
import requests
try:
r = requests.get("https://example.com", timeout=10)
print(r.status_code)
except requests.exceptions.ConnectionError as e:
print("TCP connection failed before HTTP response:", e)
You can also test basic TCP reachability from the shell:
nc -vz example.com 443
If that fails, you're not dealing with HTML parsing or headers yet. You're dealing with connection setup, routing, firewalls, or the target just refusing you.
Practical tips
- Don't blame HTTP too early: if you see connection resets, timeouts during connect, or socket errors, check TCP-level issues first.
- Remember the stack order: TCP first, then TLS, then HTTP.
- In production scraping, common TCP problems are: connection timeouts, refused connections, resets, packet loss, unstable proxies.
- Reusing connections helps: opening a fresh TCP connection for every request adds latency and more failure points.
- When debugging proxy behavior, test whether the proxy can actually establish TCP connections to the target before changing scraper logic.
- If a site is doing aggressive filtering, failures may happen before any HTTP response exists: dropped connections, resets during handshake, inconsistent connect failures.
- This is one reason router layers exist: not because TCP is complicated in theory, but because at scale you keep paying for bad network paths, dead proxies, and flaky connection setup.
Use cases
- Debugging failed requests: You send a request and get no status code back, just a connection error. That usually means the failure happened at the TCP layer or during TLS setup.
- Proxy troubleshooting: A proxy pool looks fine on paper, but half the proxies can't reliably open TCP connections to the target. You don't fix that with better parsing.
- Performance tuning: If your scraper opens too many short-lived connections, TCP setup overhead starts to matter, especially across high-latency regions.
- Understanding anti-bot behavior: Some defenses don't bother returning clean HTTP blocks. They just terminate connections early or make handshakes fail in weird ways.