Infinite scroll | ScrapeRouter

Infinite scroll is a page pattern where more content loads automatically as you scroll instead of exposing numbered pages or a visible Next button. For scrapers, that means the data is often fetched by JavaScript in batches, so grabbing the first HTML response is not enough.

Examples

A basic infinite scroll flow looks like this:

Load the page
Trigger scrolling or watch the network calls
Wait for new items to load
Repeat until no new content appears or the API says there is no next cursor

In practice, the cleanest path is often to skip browser-style scrolling and call the underlying API directly if the site exposes one.

import requests

api_url = "https://example.com/api/feed?cursor=abc123"
headers = {"User-Agent": "Mozilla/5.0"}

resp = requests.get(api_url, headers=headers, timeout=30)
data = resp.json()

items = data.get("items", [])
next_cursor = data.get("next_cursor")

print(f"got {len(items)} items")
print(f"next cursor: {next_cursor}")

If you do need a rendered page, you need something that can execute JavaScript and wait for the next batch of DOM content to appear.

Practical tips

Check the network tab first: many infinite scroll pages are just calling a JSON endpoint with a cursor, offset, or page token. Scraping that is cheaper and less fragile than simulating scroll.
Do not trust the first HTML response: on these pages, initial HTML often contains only the first batch.
Stop on a real condition: no new items, no next cursor, repeated response payloads, or a clear end-of-feed marker.
Deduplicate aggressively: infinite feeds often resend overlapping items between batches.
Expect rate limits and anti-bot checks: repeated background requests are easy for sites to fingerprint.
Watch for lazy loading vs true infinite scroll: some pages only lazy-load images, others fetch entirely new records.
Use a browser only when you need it: if the site has no stable backend endpoint you can call directly, then use rendering and scripted scrolling.

A simple stop condition pattern:

seen_ids = set()
all_items = []

for item in items:
    item_id = item["id"]
    if item_id not in seen_ids:
        seen_ids.add(item_id)
        all_items.append(item)

With ScrapeRouter, this is the kind of page where a plain HTTP fetch often fails and a rendered request is the right tool. But it is still worth checking whether the page is backed by a cleaner API call first, because browser automation is slower and costs more.

Use cases

Scraping social feeds: posts load as the user keeps scrolling
Collecting product listings: marketplaces append more results without changing the page URL
Monitoring job boards: new job cards load in batches from a background API
Extracting reviews or comments: the page keeps requesting the next chunk as the viewport moves down
Reverse engineering site APIs: infinite scroll often reveals the real JSON endpoint carrying the data