Glossary

DOM

DOM stands for Document Object Model. It’s the tree-like structure a browser builds from a page’s HTML, where elements, attributes, and text become nodes you can inspect, query, and manipulate with JavaScript or a scraper.

Examples

A scraper usually reads the DOM after the page loads, then selects the parts it cares about.

from bs4 import BeautifulSoup

html = """
<html>
  <body>
    <div class="product">
      <h2>Running Shoes</h2>
      <span class="price">$89</span>
    </div>
  </body>
</html>
"""

soup = BeautifulSoup(html, "html.parser")
name = soup.select_one(".product h2").get_text(strip=True)
price = soup.select_one(".price").get_text(strip=True)

print({"name": name, "price": price})

On JavaScript-heavy pages, the DOM you get after rendering can be very different from the raw HTML response.

# Playwright example: wait for the rendered DOM
from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    page.wait_for_selector(".product")
    products = page.locator(".product").all_text_contents()
    print(products)
    browser.close()

Practical tips

  • Raw HTML and DOM are not always the same: if a site renders content with JavaScript, your HTTP client may miss data that appears fine in the browser.
  • Selectors break: classes, nesting, and generated IDs change all the time, so DOM-based scraping works until it suddenly doesn't.
  • Prefer stable attributes when possible: use things like data-testid, semantic tags, URLs, or visible text before grabbing random hashed class names.
  • Wait for the right state: in browser automation, don't just wait for page load, wait for the specific DOM element that means the data is actually there.
  • Keep extraction logic small: the more your scraper depends on deep DOM structure, the more maintenance you're signing up for.
  • Use a rendered scraper only when needed: if the data is already in the initial response or an API call, parsing the DOM is usually slower and more expensive.
  • If you're using ScrapeRouter, this is the kind of thing that matters in production: some pages need simple HTTP fetching, some need full rendering, and routing that cleanly saves a lot of wasted retries and browser time.

Use cases

  • Extracting product names, prices, ratings, and links from e-commerce pages.
  • Pulling article titles, author names, and publish dates from news or blog pages.
  • Reading tables, listings, and pagination controls from server-rendered pages.
  • Scraping JavaScript-rendered content after the browser builds the final DOM.
  • Validating whether the content you want is present in raw HTML or only appears after client-side rendering.

Related terms

HTML CSS Selector JavaScript Rendering Headless Browser XPath Parser