Virtual DOM | ScrapeRouter

A virtual DOM is an in-memory representation of a page’s DOM that frameworks like React or Vue use to figure out what changed before updating the real browser DOM. It exists to make UI updates easier to manage, but for scraping the important part is simpler: sometimes the HTML you want only shows up after JavaScript builds it.

Examples

A lot of modern sites do not ship the final page HTML up front. They send a basic shell, then JavaScript renders components and updates the real DOM after comparing changes in a virtual DOM.

<div id="app"></div>
<script src="/static/app.js"></script>

If you fetch that page with a plain HTTP client, you may only get the empty #app container. The actual product cards, search results, or prices may only appear after the framework runs in a browser context.

import requests
from bs4 import BeautifulSoup

html = requests.get("https://example.com/products").text
soup = BeautifulSoup(html, "html.parser")
print(soup.select(".product-card"))
# []

In that case, the problem is not really the virtual DOM itself. The problem is that the page depends on client-side rendering, so you need a browser-capable scraper or the underlying API.

curl -s https://example.com/products | head

Practical tips

Do not treat virtual DOM as some scraping target you need to parse directly. What matters is whether the site renders data server-side or only after JavaScript runs.
Check the initial HTML first: if the content is missing from view-source or a raw curl response, you are probably dealing with client-side rendering.
Look for easier exits before reaching for a full browser: hidden JSON in the page, XHR/fetch calls, GraphQL requests, or internal APIs.
Use a browser when the page really needs one: auth flows, JS-rendered lists, interaction-heavy apps, anti-bot logic.
In production, the tradeoff is usually cost vs reliability:
raw HTTP parsing: cheaper, faster, breaks when content moves behind JS
browser rendering: heavier, slower, but often the only thing that keeps working
With ScrapeRouter, this is exactly the kind of routing decision you do not want hardcoded all over your scraper fleet. Some pages can stay on cheap HTTP. Others need browser rendering. The annoying part is maintaining that split over time.

Use cases

Scraping React or Vue storefronts where product grids are rendered after page load
Extracting listings from single-page apps that ship almost no useful HTML initially
Handling sites where a plain request returns placeholders, but a browser session shows the full page
Deciding whether to use requests/Cheerio-style parsing or a headless browser in a scraping pipeline