Glossary

Shadow DOM

Shadow DOM is a browser feature that lets a component keep its HTML and CSS inside an isolated subtree, so normal selectors often can’t see or reach it. For scraping, that usually means the element exists on the page but your parser or selector still comes back empty unless you explicitly traverse into the shadow root.

Examples

A common failure mode is: the button is clearly visible in the browser, but querySelector() returns nothing because it lives inside a shadow root.

const host = document.querySelector('custom-login');
const shadow = host.shadowRoot;
const button = shadow.querySelector('button[type="submit"]');
console.log(button.textContent);

With Playwright, you can often pierce open shadow roots through locators, which is a lot less annoying than doing it by hand:

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://example.com")
    text = page.locator("custom-login button").text_content()
    print(text)
    browser.close()

If you're scraping raw HTML with requests and BeautifulSoup, this is where things usually fall apart: the rendered component logic is in the browser, but the shadow content may not exist in the server response in a usable way.

Practical tips

  • First check whether the site uses open or closed shadow roots: open roots can usually be traversed, closed roots are much more restrictive.
  • If your selector fails but the element is visibly on screen, inspect the page in DevTools and look for #shadow-root.
  • Don’t waste time trying to solve Shadow DOM with static HTML parsers alone: use a browser context when the page is component-heavy.
  • Expect extra maintenance: frontend teams love swapping component libraries, and your selectors break in weird ways.
  • If you’re routing scraping jobs through ScrapeRouter, this is the kind of page where browser rendering matters: plain fetch is cheaper, but it won’t help if the data lives behind client-side components.
  • Prefer stable anchors over deep component paths: text, roles, data attributes, and nearby labels tend to survive longer than brittle nested selectors.
# Quick sanity check: if the raw HTML looks empty but the browser shows data,
# you're probably dealing with client rendering, Shadow DOM, or both.
curl -s https://example.com | head

Use cases

  • Scraping product pages built with web components: price, stock status, variant pickers.
  • Pulling data from modern dashboards and internal tools: filters, tables, buttons, and labels often sit inside component trees.
  • Testing or automating login and checkout flows: forms may be wrapped in custom elements, which breaks naive selectors.
  • Debugging “element not found” errors in production scrapers: the data is there, just not in the regular DOM path you expected.

Related terms

Headless Browser JavaScript Rendering Dynamic Content DOM CSS Selector Web Components