Glossary

WebDriver

WebDriver is the browser automation interface tools like Selenium use to control a real browser. In scraping, it matters when the page only renders data after JavaScript runs or when you need to click, scroll, type, or wait for elements like an actual user session.

Examples

A basic Selenium script uses WebDriver to start a browser, load a page, and wait for JavaScript-rendered content before reading it.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("--headless=new")

driver = webdriver.Chrome(options=options)
try:
    driver.get("https://example.com/dashboard")
    WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CSS_SELECTOR, ".result-row"))
    )
    rows = driver.find_elements(By.CSS_SELECTOR, ".result-row")
    data = [row.text for row in rows]
    print(data)
finally:
    driver.quit()

Sometimes you need WebDriver because the site does not return useful HTML until after clicks or scroll events.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options

options = Options()
options.add_argument("--headless=new")

driver = webdriver.Chrome(options=options)
try:
    driver.get("https://example.com/products")
    driver.find_element(By.CSS_SELECTOR, "button.load-more").click()
    cards = driver.find_elements(By.CSS_SELECTOR, ".product-card")
    print(f"found {len(cards)} cards")
finally:
    driver.quit()

Practical tips

  • Use WebDriver when you actually need a browser: JavaScript rendering, login flows, clicks, infinite scroll, bot checks tied to browser behavior.
  • Do not use it for everything: if the page data is already in the HTML or available from a direct XHR or JSON endpoint, plain HTTP scraping is cheaper and easier to keep alive.
  • Expect more operational overhead: browser startup time, memory usage, driver version mismatch, timeouts, flaky selectors, and anti-bot issues.
  • Always use explicit waits instead of sleeping blindly: wait for a selector, network state, or a known page change.
  • Close sessions properly: leaked browser processes get expensive fast in production.
  • Keep selectors boring and resilient: prefer stable attributes over brittle full DOM paths.
  • If you are running browser-based scraping at scale, the real problem is rarely just WebDriver code: it's browser infrastructure, retries, proxy quality, session handling, and keeping the whole thing from falling over on a Tuesday.
  • If you want browser rendering without owning that stack yourself, a scraping API with JS rendering can remove a lot of the annoying parts.
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

WebDriverWait(driver, 15).until(
    EC.visibility_of_element_located((By.CSS_SELECTOR, "[data-ready='true']"))
)

Use cases

  • Scraping JavaScript-heavy pages where the content appears only after hydration.
  • Automating login and session flows: filling forms, handling redirects, waiting for authenticated pages.
  • Interacting with UI-driven data loading: click to expand, load more, tab switches, filters, date pickers.
  • Capturing data from dashboards or internal tools that do not expose a clean API.
  • Testing whether a page can be scraped reliably in a browser before deciding if you really want to run browsers in production.
  • Handling edge cases where a simple HTTP client is not enough, but also recognizing that a full browser is often the expensive fallback, not the first choice.

Related terms

Headless Browser Selenium Browser Automation JavaScript Rendering Dynamic Content DOM Proxy Rotation Anti-Bot