Glossary

CAPTCHA

A CAPTCHA is a challenge a site shows to figure out whether the visitor is a human or an automated script. In scraping, it usually means the target thinks your traffic looks suspicious, often because of bad IPs, broken browser fingerprints, or bot-like request patterns.

Examples

A CAPTCHA usually shows up after repeated requests, login attempts, or page flows that don't look like normal user traffic.

  • Common examples: image grids, distorted text, checkbox challenges, full-page interstitials from reCAPTCHA or hCaptcha
  • In scraping, the important part is not the puzzle itself, it's why you got it
curl -I https://target-site.example

If a simple request path turns into a browser challenge, the site is telling you your request profile looks wrong.

if "captcha" in response.text.lower() or response.status_code in [403, 429]:
    print("Possible bot challenge or block")

That check is crude, but in production people use exactly this kind of detection first, then add better challenge detection later.

Practical tips

  • Treat CAPTCHA as a signal, not just an obstacle: weak proxies, reused sessions, missing browser execution, bad headers, and unrealistic navigation patterns are common causes
  • Don't jump straight to solver services: if your traffic quality is bad, you'll just pay to solve the same underlying problem over and over
  • Separate simple pages from protected flows: product pages, search, login, checkout, account pages often have very different bot defenses
  • Track challenge rate per domain: if CAPTCHA rate spikes after a proxy pool change, that's usually not a coincidence
  • Use consistent sessions when the site expects them: randomizing everything can look as fake as reusing everything
  • In production, the cheapest fix is often better routing and browser handling, not more retries
  • If you're using a router layer like ScrapeRouter, the point is to reduce how often you hit CAPTCHA in the first place by using the right scraping method for the target: raw requests where possible, browser automation where necessary, provider routing when one setup starts failing

Use cases

  • Monitoring target health: a rising CAPTCHA rate is often your first sign that a site changed bot defenses
  • Choosing scraping strategy: some targets work with plain HTTP requests, others need full browser rendering and better fingerprint handling
  • Cost control: solving CAPTCHAs at scale gets expensive fast, so teams try to reduce trigger rate before adding solver spend
  • Fallback design: when one provider or browser setup starts getting challenged, routing traffic differently can keep jobs running
  • Login and account automation: CAPTCHA is especially common around auth, signup, password reset, and other abuse-sensitive flows

Related terms

Proxy Rotation Headless Browser Browser Fingerprinting Rate Limiting HTTP 403 Session User-Agent Web Scraping API