Glossary

TCP

TCP, or Transmission Control Protocol, is the transport layer protocol that makes sure data arrives reliably and in order between a client and server. In scraping, it sits underneath HTTP and HTTPS, so when requests fail before you even get a response, the problem is often down at the TCP level, not in your parser or request code.

Examples

A normal scraper request usually starts lower in the stack than people think:

  • Your client opens a TCP connection to the target server
  • If it's HTTPS, TLS is negotiated on top of that TCP connection
  • Then the HTTP request is sent

If the TCP connection can't be established, the request never gets far enough to return a 403, 404, or 200.

import requests

try:
    r = requests.get("https://example.com", timeout=10)
    print(r.status_code)
except requests.exceptions.ConnectionError as e:
    print("TCP connection failed before HTTP response:", e)

You can also test basic TCP reachability from the shell:

nc -vz example.com 443

If that fails, you're not dealing with HTML parsing or headers yet. You're dealing with connection setup, routing, firewalls, or the target just refusing you.

Practical tips

  • Don't blame HTTP too early: if you see connection resets, timeouts during connect, or socket errors, check TCP-level issues first.
  • Remember the stack order: TCP first, then TLS, then HTTP.
  • In production scraping, common TCP problems are: connection timeouts, refused connections, resets, packet loss, unstable proxies.
  • Reusing connections helps: opening a fresh TCP connection for every request adds latency and more failure points.
  • When debugging proxy behavior, test whether the proxy can actually establish TCP connections to the target before changing scraper logic.
  • If a site is doing aggressive filtering, failures may happen before any HTTP response exists: dropped connections, resets during handshake, inconsistent connect failures.
  • This is one reason router layers exist: not because TCP is complicated in theory, but because at scale you keep paying for bad network paths, dead proxies, and flaky connection setup.

Use cases

  • Debugging failed requests: You send a request and get no status code back, just a connection error. That usually means the failure happened at the TCP layer or during TLS setup.
  • Proxy troubleshooting: A proxy pool looks fine on paper, but half the proxies can't reliably open TCP connections to the target. You don't fix that with better parsing.
  • Performance tuning: If your scraper opens too many short-lived connections, TCP setup overhead starts to matter, especially across high-latency regions.
  • Understanding anti-bot behavior: Some defenses don't bother returning clean HTTP blocks. They just terminate connections early or make handshakes fail in weird ways.

Related terms

HTTP HTTPS TLS Proxy Connection Timeout TLS Fingerprinting