Glossary

CDP (Chrome DevTools Protocol)

CDP is the low-level protocol Chrome and other Chromium-based browsers expose for remote control, usually over a WebSocket connection. It lets you do the same kinds of things DevTools does: inspect pages, run JavaScript, intercept network traffic, read cookies, and capture screenshots. In scraping, people use it because it gives more direct browser control than higher-level automation libraries.

Examples

A lot of browser automation tools are really just wrapping CDP underneath. If you talk to CDP directly, you skip some framework overhead and get closer to the browser internals.

google-chrome --remote-debugging-port=9222

Once Chrome is running with remote debugging enabled, you can inspect the available targets:

curl http://127.0.0.1:9222/json

A typical target response includes a webSocketDebuggerUrl, which is what CDP clients connect to:

{
  "id": "target-id",
  "title": "Example Domain",
  "type": "page",
  "url": "https://example.com",
  "webSocketDebuggerUrl": "ws://127.0.0.1:9222/devtools/page/target-id"
}

From there, a client can send CDP commands to navigate, evaluate JavaScript, or listen for network events. In practice, this is how people do things like wait for XHR responses, grab rendered HTML, or inspect API calls a page makes after load.

Practical tips

  • CDP is powerful, but it is not magic: it gives you low-level browser control, which is useful, but it does not solve proxy rotation, CAPTCHA handling, session strategy, or anti-bot defenses by itself.
  • Expect Chromium-specific behavior: CDP is mainly a Chromium world thing. If you need cross-browser automation, this is usually not the abstraction you want.
  • Use it when you actually need browser internals: network interception, script evaluation, response inspection, performance events, and rendered DOM access are the common reasons.
  • Direct CDP can be leaner than a full automation framework: less abstraction, fewer moving parts, sometimes better performance. The tradeoff is that you own more of the plumbing.
  • In production, browser control is the easy part: keeping sessions stable, avoiding bans, managing retries, and not burning engineering time on flaky browser fleets is where the pain usually starts.
  • If your use case is just "fetch HTML and move on": you probably do not need CDP at all.
  • With ScrapeRouter: you do not have to build your whole system around CDP unless the target really demands browser-level control. That is usually the right tradeoff.

Use cases

  • Scraping sites that render critical content with JavaScript
  • Intercepting background API calls to extract structured data directly
  • Capturing screenshots or PDFs of rendered pages
  • Reading cookies, local storage, or session state during a browser session
  • Debugging why a scraper works locally but fails once anti-bot logic kicks in
  • Building custom browser automation flows without depending fully on Puppeteer or Playwright

Related terms

Headless Browser Browser Automation JavaScript Rendering WebSocket Proxy Rotation CAPTCHA Playwright Puppeteer