Examples
A lot of modern sites do not ship the final page HTML up front. They send a basic shell, then JavaScript renders components and updates the real DOM after comparing changes in a virtual DOM.
<div id="app"></div>
<script src="/static/app.js"></script>
If you fetch that page with a plain HTTP client, you may only get the empty #app container. The actual product cards, search results, or prices may only appear after the framework runs in a browser context.
import requests
from bs4 import BeautifulSoup
html = requests.get("https://example.com/products").text
soup = BeautifulSoup(html, "html.parser")
print(soup.select(".product-card"))
# []
In that case, the problem is not really the virtual DOM itself. The problem is that the page depends on client-side rendering, so you need a browser-capable scraper or the underlying API.
curl -s https://example.com/products | head
Practical tips
- Do not treat virtual DOM as some scraping target you need to parse directly. What matters is whether the site renders data server-side or only after JavaScript runs.
- Check the initial HTML first: if the content is missing from
view-sourceor a rawcurlresponse, you are probably dealing with client-side rendering. - Look for easier exits before reaching for a full browser: hidden JSON in the page, XHR/fetch calls, GraphQL requests, or internal APIs.
- Use a browser when the page really needs one: auth flows, JS-rendered lists, interaction-heavy apps, anti-bot logic.
- In production, the tradeoff is usually cost vs reliability:
- raw HTTP parsing: cheaper, faster, breaks when content moves behind JS
- browser rendering: heavier, slower, but often the only thing that keeps working
- With ScrapeRouter, this is exactly the kind of routing decision you do not want hardcoded all over your scraper fleet. Some pages can stay on cheap HTTP. Others need browser rendering. The annoying part is maintaining that split over time.
Use cases
- Scraping React or Vue storefronts where product grids are rendered after page load
- Extracting listings from single-page apps that ship almost no useful HTML initially
- Handling sites where a plain request returns placeholders, but a browser session shows the full page
- Deciding whether to use
requests/Cheerio-style parsing or a headless browser in a scraping pipeline