TCP | ScrapeRouter

TCP, or Transmission Control Protocol, is the transport layer protocol that makes sure data arrives reliably and in order between a client and server. In scraping, it sits underneath HTTP and HTTPS, so when requests fail before you even get a response, the problem is often down at the TCP level, not in your parser or request code.

Examples

A normal scraper request usually starts lower in the stack than people think:

Your client opens a TCP connection to the target server
If it's HTTPS, TLS is negotiated on top of that TCP connection
Then the HTTP request is sent

If the TCP connection can't be established, the request never gets far enough to return a 403, 404, or 200.

import requests

try:
    r = requests.get("https://example.com", timeout=10)
    print(r.status_code)
except requests.exceptions.ConnectionError as e:
    print("TCP connection failed before HTTP response:", e)

You can also test basic TCP reachability from the shell:

nc -vz example.com 443

If that fails, you're not dealing with HTML parsing or headers yet. You're dealing with connection setup, routing, firewalls, or the target just refusing you.

Practical tips

Don't blame HTTP too early: if you see connection resets, timeouts during connect, or socket errors, check TCP-level issues first.
Remember the stack order: TCP first, then TLS, then HTTP.
In production scraping, common TCP problems are: connection timeouts, refused connections, resets, packet loss, unstable proxies.
Reusing connections helps: opening a fresh TCP connection for every request adds latency and more failure points.
When debugging proxy behavior, test whether the proxy can actually establish TCP connections to the target before changing scraper logic.
If a site is doing aggressive filtering, failures may happen before any HTTP response exists: dropped connections, resets during handshake, inconsistent connect failures.
This is one reason router layers exist: not because TCP is complicated in theory, but because at scale you keep paying for bad network paths, dead proxies, and flaky connection setup.

Use cases

Debugging failed requests: You send a request and get no status code back, just a connection error. That usually means the failure happened at the TCP layer or during TLS setup.
Proxy troubleshooting: A proxy pool looks fine on paper, but half the proxies can't reliably open TCP connections to the target. You don't fix that with better parsing.
Performance tuning: If your scraper opens too many short-lived connections, TCP setup overhead starts to matter, especially across high-latency regions.
Understanding anti-bot behavior: Some defenses don't bother returning clean HTTP blocks. They just terminate connections early or make handshakes fail in weird ways.