Examples
A basic infinite scroll flow looks like this:
- Load the page
- Trigger scrolling or watch the network calls
- Wait for new items to load
- Repeat until no new content appears or the API says there is no next cursor
In practice, the cleanest path is often to skip browser-style scrolling and call the underlying API directly if the site exposes one.
import requests
api_url = "https://example.com/api/feed?cursor=abc123"
headers = {"User-Agent": "Mozilla/5.0"}
resp = requests.get(api_url, headers=headers, timeout=30)
data = resp.json()
items = data.get("items", [])
next_cursor = data.get("next_cursor")
print(f"got {len(items)} items")
print(f"next cursor: {next_cursor}")
If you do need a rendered page, you need something that can execute JavaScript and wait for the next batch of DOM content to appear.
Practical tips
- Check the network tab first: many infinite scroll pages are just calling a JSON endpoint with a cursor, offset, or page token. Scraping that is cheaper and less fragile than simulating scroll.
- Do not trust the first HTML response: on these pages, initial HTML often contains only the first batch.
- Stop on a real condition: no new items, no next cursor, repeated response payloads, or a clear end-of-feed marker.
- Deduplicate aggressively: infinite feeds often resend overlapping items between batches.
- Expect rate limits and anti-bot checks: repeated background requests are easy for sites to fingerprint.
- Watch for lazy loading vs true infinite scroll: some pages only lazy-load images, others fetch entirely new records.
- Use a browser only when you need it: if the site has no stable backend endpoint you can call directly, then use rendering and scripted scrolling.
A simple stop condition pattern:
seen_ids = set()
all_items = []
for item in items:
item_id = item["id"]
if item_id not in seen_ids:
seen_ids.add(item_id)
all_items.append(item)
With ScrapeRouter, this is the kind of page where a plain HTTP fetch often fails and a rendered request is the right tool. But it is still worth checking whether the page is backed by a cleaner API call first, because browser automation is slower and costs more.
Use cases
- Scraping social feeds: posts load as the user keeps scrolling
- Collecting product listings: marketplaces append more results without changing the page URL
- Monitoring job boards: new job cards load in batches from a background API
- Extracting reviews or comments: the page keeps requesting the next chunk as the viewport moves down
- Reverse engineering site APIs: infinite scroll often reveals the real JSON endpoint carrying the data