Glossary

43 terms

B
C

Canvas Fingerprinting

Canvas fingerprinting is a browser identification technique that uses the HTML5 canvas API to draw hidden text or images, then reads back the rendered result to help identify a device or browser. In scraping, it matters because anti-bot systems use it as one signal to tell apart real browsers, headless browsers, and poorly configured automation.

fingerprinting browser javascript detection anti-bot scraping

CAPTCHA

A CAPTCHA is a challenge a site shows to decide whether the visitor is probably human or automated, usually after it sees behavior it does not like. In scraping, it is less a standalone problem than a signal: your request path, IP quality, browser fingerprint, or request pattern is getting flagged.

captcha anti-bot scraping detection proxies browser production

CDP (Chrome DevTools Protocol)

CDP is the protocol Chrome and other Chromium-based browsers expose for remote control and inspection, usually over a WebSocket connection. In scraping, it gives you low-level access to things like page navigation, JavaScript execution, network events, cookies, and screenshots without going through a higher-level automation library.

browser chromium cdp javascript rendering automation debugging protocol

City-Level Routing

City-level routing means sending a scraping request through an IP located in a specific city, not just a country or region. The point is to see what the target site shows to users in that local market, which matters for things like localized pricing, availability, maps, ads, and search results.

geo routing proxies localization scraping

CORS

CORS, or Cross-Origin Resource Sharing, is a browser security mechanism that controls whether JavaScript running on one origin can make requests to another origin. The important part for scraping is that this is mostly a browser problem, not a server-to-server problem, which is why a request blocked in frontend JavaScript may work fine from a backend scraper.

cors browser security http javascript api scraping

CSS

CSS means Cascading Style Sheets, the language browsers use to control how HTML looks on the page. In scraping, people also say “CSS” as shorthand for CSS selectors, which are patterns used to find elements in the DOM without writing XPath.

D
H

Headful Browser

A headful browser is a browser running with a visible user interface, like a normal desktop Chrome or Firefox session. In scraping, teams use headful mode when a target behaves differently under automation, needs full interaction, or is harder to get through in headless mode. It usually costs more CPU, memory, and time, so it is not something you want to use everywhere by default.

browser rendering automation javascript scraping

Headless Browser

A headless browser is a real browser running without a visible UI, usually controlled by code through tools like Playwright, Puppeteer, or Selenium. In scraping, you use it when a plain HTTP request is not enough because the page depends on JavaScript, browser APIs, or client-side rendering.

browser rendering javascript automation scraping

Honeypot

A honeypot is a trap a website sets to catch bots or scrapers, usually by adding links, form fields, or elements that normal users never interact with. If your scraper clicks, submits, or follows them, you make yourself easy to detect and often end up blocked.

scraping detection anti-bot crawler forms automation

HTTP

HTTP, or Hypertext Transfer Protocol, is the basic request-response protocol browsers, APIs, and scrapers use to talk to web servers. In practice, it is the layer where you send a request like GET or POST, get back a response with headers, status codes, and a body, and then find out whether the target gives you the page, blocks you, redirects you, or rate limits you.

http protocol requests web scraping networking
R
S
T

Tarpitting

Tarpitting is a defensive technique where a server intentionally slows down, stalls, or misleads a client instead of blocking it outright. In scraping, it is used to waste bot time, tie up connections, increase costs, or feed low-value responses, which means a request can look "successful" while still being a failure in practice.

anti-bot blocking scraping networking security performance

TCP

TCP, short for Transmission Control Protocol, is the transport protocol that makes sure data sent between a client and server arrives reliably and in order. In web scraping, it sits underneath HTTP and HTTPS, so every request starts with a TCP connection before any page data is transferred.

networking protocols http https tcp scraping

TLS

TLS, short for Transport Layer Security, is the protocol that secures HTTPS connections by encrypting traffic between a client and a server. In scraping, TLS is not just about encryption anymore; how a client negotiates TLS can also affect whether a request looks like a real browser or gets flagged as automation.

tls https security fingerprinting detection networking scraping

TLS Handshake

A TLS handshake is the first part of an HTTPS connection, where the client and server agree on how to encrypt traffic and establish session keys. In scraping, it matters because sites can inspect handshake details before any HTTP request is processed, which means your client can look suspicious even if your headers and cookies look fine.

tls https fingerprinting networking detection scraping
W