Examples
A few common web scraping honeypots:
- Hidden links: links present in the HTML but hidden with CSS, positioned off-screen, or buried in invisible containers
- Invisible form fields: extra inputs real users never fill, but basic bots submit anyway
- Fake pagination or decoy endpoints: URLs that exist mainly to identify aggressive crawlers
from bs4 import BeautifulSoup
html = """
<html>
<body>
<a href="/products/1">Real product</a>
<a href="/trap" style="display:none">Do not click</a>
</body>
</html>
"""
soup = BeautifulSoup(html, "html.parser")
for a in soup.select("a[href]"):
style = (a.get("style") or "").replace(" ", "").lower()
if "display:none" in style:
continue
print(a["href"])
# Bad crawler behavior: blindly requesting every discovered URL
curl https://example.com/trap
The usual failure mode is boring: the scraper works fine in testing, then starts getting blocked in production because it clicks or submits things no human would.
Practical tips
- Don’t treat every URL in the raw HTML as safe to follow.
- Check for hidden elements:
display:none,visibility:hidden, zero-sized elements, off-screen positioning,aria-hidden, suspicious classes, and form fields users never interact with. - Compare rendered behavior vs raw markup. A lot of traps are obvious once you look at the page like a browser instead of like a string parser.
- Be careful with broad selectors that grab every link or input on the page.
- If a target suddenly starts returning bans after a crawler expansion, assume you tripped something before assuming proxies are the whole problem.
- Add basic filtering before enqueueing links.
def looks_hidden(style: str) -> bool:
s = (style or "").replace(" ", "").lower()
markers = ["display:none", "visibility:hidden", "left:-", "top:-", "opacity:0"]
return any(marker in s for marker in markers)
- In production, this is one of those details that wastes time because the scraper may pass local tests and still fail once you crawl at scale.
- If you’re using a router layer like ScrapeRouter, it helps with the transport side: browser rendering, retries, fingerprinting, proxy routing. It does not magically make honeypot logic go away; you still need sane extraction and link-following rules.
Use cases
- Site defense: publishers, ecommerce sites, and marketplaces use honeypots to identify bots that scrape too mechanically.
- Bot detection: security teams place decoy links, fields, or endpoints to separate real users from automation.
- Scraper hardening: teams building production scrapers use honeypot awareness to avoid easy bans and reduce pointless debugging.
- Form abuse prevention: hidden fields are often used to catch spam bots submitting forms that humans never actually see.