Glossary

Hydration

Hydration is the step where client-side JavaScript takes server-rendered HTML and turns it into a live app by attaching state, event handlers, and component logic. For scraping, it matters because a lot of modern sites ship useful data in the page before hydration finishes, and that data is often easier to extract than waiting for the fully rendered UI.

Examples

A common case is Next.js. The page HTML often includes serialized app data in script tags, and you can pull that directly instead of driving a browser around and hoping selectors stay stable.

import requests
from bs4 import BeautifulSoup
import json

url = "https://example.com/product/123"
html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")

script = soup.find("script", id="__NEXT_DATA__")
if script:
    data = json.loads(script.string)
    product = data["props"]["pageProps"].get("product")
    print(product)

If the site only becomes usable after hydration, you may need a browser-capable scraper.

curl -X POST "https://www.scraperouter.com/api/v1/scrape/" \
  -H "Authorization: Api-Key $api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/app",
    "render": true
  }'

Practical tips

  • Check the raw HTML before reaching for browser automation: look for __NEXT_DATA__, window.__INITIAL_STATE__, JSON blobs, and inline script tags.
  • If the data is in the HTML before hydration, extract that instead of scraping rendered DOM text: it is faster, cheaper, and usually less fragile.
  • If hydration fetches data from XHR or GraphQL after load, inspect network calls: scraping the API response is often cleaner than scraping the UI.
  • Don’t confuse hydrated UI with data availability: the page can look empty in the browser for a moment while the raw response already contains what you need.
  • In production, hydration patterns change during frontend rewrites: selectors break, embedded JSON keys move, script tag IDs change. Monitor for that, don’t assume it stays fixed.

Use cases

  • Scraping Next.js sites: parse __NEXT_DATA__ instead of waiting for cards, tables, or product widgets to appear.
  • Reducing browser usage: skip full rendering when the important data is already embedded in the initial HTML.
  • Stabilizing extraction: pull structured hydration data rather than relying on brittle CSS selectors tied to the visual layer.
  • Debugging missing data: compare raw HTML vs rendered page to see whether the issue is hydration, delayed API calls, or anti-bot behavior.

Related terms

Server-Side Rendering Client-Side Rendering Headless Browser DOM Next.js JavaScript Rendering XHR GraphQL