Examples
Most scraping code that says "use CSS" really means use a CSS selector.
from bs4 import BeautifulSoup
html = """
<html>
<body>
<div class="product-card" data-sku="123">
<h2 class="title">Running Shoes</h2>
<span class="price">$79</span>
<a href="/p/123">View product</a>
</div>
</body>
</html>
"""
soup = BeautifulSoup(html, "html.parser")
card = soup.select_one("div.product-card")
title = soup.select_one("h2.title").get_text(strip=True)
price = soup.select_one("span.price").get_text(strip=True)
link = soup.select_one("div.product-card a")["href"]
print(title, price, link)
# Common selector patterns
"div.product-card" # element with class
"#main" # element with id
"a[href]" # element with attribute
"div[data-sku='123']" # exact attribute match
"ul > li" # direct children
".product-card .price" # nested descendant
In browser devtools, CSS selectors are usually the fastest way to test whether your extraction logic is sane before you write code.
Practical tips
- Treat CSS as a locator tool, not a guarantee. A selector that works today can break next week because someone renamed a class or shuffled the DOM.
- Prefer stable attributes over styling classes:
data-*, semantic attributes, consistent container structure. - Avoid selectors that are too long. If your selector looks like a full DOM breadcrumb, it will probably die on the next frontend deploy.
- Be careful with autogenerated class names from React, Vue, Tailwind-heavy builds, or CSS-in-JS setups: they often change, and they change for no useful reason.
- Test selectors against multiple pages, not just one lucky example.
- In production scraping, CSS selectors are usually fine for static extraction. Once the page is JS-heavy, delayed, or anti-bot protected, the harder part is getting a clean rendered page consistently, not writing
div.price. - With ScrapeRouter, the point is not to replace CSS selectors. The point is to make the page retrieval layer less fragile so your selectors have a chance to keep working.
# Better: anchored on stable attributes
product = soup.select_one('[data-testid="product-card"]')
price = soup.select_one('[itemprop="price"]')
# Riskier: tied to presentation classes
price = soup.select_one('.text-red-500.font-bold.md\\:text-xl')
Use cases
- Extracting product data: title, price, availability, image URLs, links
- Pulling content from article pages: headline, author, publish date, body blocks
- Navigating repeated page structures: search results, listing cards, table rows
- Targeting elements in browser automation: click buttons, fill inputs, wait for components
- Building parsers that are readable enough for another engineer to debug at 2 a.m.