Glossary

Honeypot

A honeypot is a trap used to catch bots or suspicious automation by exposing something a real user usually would not touch, like a hidden link, invisible form field, or fake endpoint. In scraping, hitting a honeypot is a fast way to get flagged because it tells the site you are parsing the page mechanically instead of behaving like a normal browser session.

Examples

A few common web scraping honeypots:

  • Hidden links: links present in the HTML but hidden with CSS, positioned off-screen, or buried in invisible containers
  • Invisible form fields: extra inputs real users never fill, but basic bots submit anyway
  • Fake pagination or decoy endpoints: URLs that exist mainly to identify aggressive crawlers
from bs4 import BeautifulSoup

html = """
<html>
  <body>
    <a href="/products/1">Real product</a>
    <a href="/trap" style="display:none">Do not click</a>
  </body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")

for a in soup.select("a[href]"):
    style = (a.get("style") or "").replace(" ", "").lower()
    if "display:none" in style:
        continue
    print(a["href"])
# Bad crawler behavior: blindly requesting every discovered URL
curl https://example.com/trap

The usual failure mode is boring: the scraper works fine in testing, then starts getting blocked in production because it clicks or submits things no human would.

Practical tips

  • Don’t treat every URL in the raw HTML as safe to follow.
  • Check for hidden elements: display:none, visibility:hidden, zero-sized elements, off-screen positioning, aria-hidden, suspicious classes, and form fields users never interact with.
  • Compare rendered behavior vs raw markup. A lot of traps are obvious once you look at the page like a browser instead of like a string parser.
  • Be careful with broad selectors that grab every link or input on the page.
  • If a target suddenly starts returning bans after a crawler expansion, assume you tripped something before assuming proxies are the whole problem.
  • Add basic filtering before enqueueing links.
def looks_hidden(style: str) -> bool:
    s = (style or "").replace(" ", "").lower()
    markers = ["display:none", "visibility:hidden", "left:-", "top:-", "opacity:0"]
    return any(marker in s for marker in markers)
  • In production, this is one of those details that wastes time because the scraper may pass local tests and still fail once you crawl at scale.
  • If you’re using a router layer like ScrapeRouter, it helps with the transport side: browser rendering, retries, fingerprinting, proxy routing. It does not magically make honeypot logic go away; you still need sane extraction and link-following rules.

Use cases

  • Site defense: publishers, ecommerce sites, and marketplaces use honeypots to identify bots that scrape too mechanically.
  • Bot detection: security teams place decoy links, fields, or endpoints to separate real users from automation.
  • Scraper hardening: teams building production scrapers use honeypot awareness to avoid easy bans and reduce pointless debugging.
  • Form abuse prevention: hidden fields are often used to catch spam bots submitting forms that humans never actually see.

Related terms

Bot Detection Headless Browser Rate Limit CAPTCHA Fingerprinting Proxy Rotation Request Throttling Robots.txt