Glossary

Event loop

An event loop is the part of a runtime that keeps track of async work and decides what runs next without blocking the whole program. In scraping, it matters because network requests, browser automation, waits, retries, and timeouts all pile up fast, and if you misuse the loop you get slow crawlers, stuck tasks, or weird concurrency bugs.

Examples

A simple example in Python with asyncio:\n\npython\nimport asyncio\n\nasync def fetch(page_id):\n await asyncio.sleep(1)\n return {"page_id": page_id, "status": 200}\n\nasync def main():\n results = await asyncio.gather(\n fetch(1),\n fetch(2),\n fetch(3),\n )\n print(results)\n\nasyncio.run(main())\n\n\nWhat the event loop is doing here: scheduling each coroutine, pausing them while they wait, then resuming them when they are ready. That is why three waits of 1 second finish in about 1 second total instead of 3.\n\nA common scraping mistake is blocking the loop with synchronous code:\n\npython\nimport asyncio\nimport time\n\nasync def bad_fetch():\n time.sleep(2)\n return "done"\n\nasync def main():\n await asyncio.gather(bad_fetch(), bad_fetch())\n\nasyncio.run(main())\n\n\nThat time.sleep(2) blocks everything. In production, this is how an async scraper quietly turns back into a slow serial one.

Practical tips

  • Use non-blocking libraries inside async code: aiohttp, Playwright async APIs, asyncio.sleep(), not requests or time.sleep()\n- Set concurrency limits on purpose: too little and you waste time, too much and you trigger bans, memory spikes, or browser crashes\n- Wrap tasks with timeouts and retries: event loops are good at handling many waiting operations, but they will happily keep waiting forever if you let them\n- Watch for hidden blocking work: HTML parsing, big JSON transforms, file writes, image processing\n- If CPU-heavy work is unavoidable, move it off the main loop: thread pool, process pool, separate worker\n- In browser scraping, one overloaded loop can stall page actions, websocket traffic, and timeout handling at the same time\n- If you do not need to manage browser sessions, proxies, retries, and anti-bot handling yourself, offloading fetches to an API like ScrapeRouter can remove a lot of event-loop complexity from your own app\n\nA basic concurrency limit looks like this:\n\npython\nimport asyncio\n\nsemaphore = asyncio.Semaphore(5)\n\nasync def fetch(url):\n async with semaphore:\n await asyncio.sleep(1)\n return url\n\nasync def main():\n urls = [f"https://example.com/page/{i}" for i in range(20)]\n results = await asyncio.gather(*(fetch(url) for url in urls))\n print(len(results))\n\nasyncio.run(main())\n

Use cases

  • Running hundreds of HTTP requests concurrently without opening hundreds of threads\n- Coordinating browser actions, waits, navigation events, and response listeners in Playwright or Puppeteer\n- Managing retries, backoff, rate limits, and per-request timeouts in a crawler\n- Consuming queued scrape jobs while keeping workers responsive under network latency\n- Streaming paginated or incremental results as they arrive instead of waiting for the whole batch to finish

Related terms

Asynchronous scraping Concurrency Timeout Retry logic Rate limiting Headless browser WebSocket