Examples
XPath is useful when CSS selectors stop being enough, especially when you need to match text, move up the DOM, or select by position.
from lxml import html
page = html.fromstring("""
<html>
<body>
<div class="product">
<h2>Red Shoes</h2>
<span class="price">$49</span>
</div>
<div class="product">
<h2>Blue Shoes</h2>
<span class="price">$59</span>
</div>
</body>
</html>
""")
names = page.xpath('//div[@class="product"]/h2/text()')
prices = page.xpath('//div[@class="product"]/span[@class="price"]/text()')
print(names)
print(prices)
# Select the first matching element
page.xpath('(//div[@class="product"])[1]//h2/text()')
# Select by text content
page.xpath('//button[contains(text(), "Load more")]')
# Select an attribute
page.xpath('//a[@class="next"]/@href')
If you're scraping through a browser or a scraping API, the XPath itself is just one part. The annoying part is keeping it working when the page gets re-rendered, reordered, or A/B tested.
Practical tips
- Prefer stable attributes over deep full-path selectors:
//div[3]/section[2]/ul/li[4]works until the site changes one wrapper. - Use XPath when CSS becomes awkward: text matching, parent traversal, positional selection.
- Don’t overfit to today’s DOM: if a selector only works on one exact page shape, it will probably fail later.
- Test selectors against multiple pages, not just one sample.
- If the site is JavaScript-heavy, make sure the HTML is actually rendered before blaming the XPath.
- In production, log selector failures separately from request failures. A bad XPath and a blocked request are different problems.
- If you’re routing scraping jobs through ScrapeRouter, XPath still matters at the extraction layer: the router helps with fetch stability, but it can’t save a selector that was too fragile to begin with.
Use cases
- Extracting product names, prices, links, ratings, and stock labels from messy e-commerce pages.
- Pulling data from HTML where CSS selectors aren’t enough: matching text, selecting siblings, walking up to parent nodes.
- Parsing XML feeds, sitemaps, and structured exports where XPath is the native tool.
- Browser automation and testing: locating elements in Selenium or similar tools when the DOM is complicated.
- Recovering data from inconsistent markup where you need more control than simple class-based selection gives you.