Examples
A few common cases:
- Product pages with tracking params:
/product/123?utm_source=xshould point to/product/123 - Same page on multiple paths:
/blog/postand/blog/post/should agree on one version - Filtered or sorted listings:
/shoes?sort=pricemay canonically point to/shoesif the content is not meant to rank separately
HTML example:
<link rel="canonical" href="https://example.com/product/123" />
If you're scraping a page and want to inspect the canonical URL, you're usually looking for the rel="canonical" link in the head:
from bs4 import BeautifulSoup
import requests
html = requests.get("https://example.com/product/123?utm_source=ad").text
soup = BeautifulSoup(html, "html.parser")
canonical = soup.find("link", rel="canonical")
print(canonical["href"] if canonical else None)
With ScrapeRouter:
curl -X POST https://www.scraperouter.com/api/v1/scrape/ \
-H "Authorization: Api-Key $api_key" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com/product/123?utm_source=ad"
}'
Practical tips
- Do not treat the canonical tag as ground truth: sites misconfigure this all the time, point everything to the homepage, or forget to update it after migrations.
- Compare the fetched URL, final URL after redirects, and canonical URL: if all three disagree, that is usually a useful signal that the site has URL hygiene problems.
- For scraping and deduplication, canonical URLs are helpful but not enough: pages with no canonical tag, broken tags, or cross-domain canonicals still happen a lot in production.
- Normalize URLs before storing them: strip obvious tracking parameters, normalize trailing slashes where appropriate, and keep the raw source URL too.
- If you crawl large sites, use canonical URLs to reduce duplicate work, but do not blindly drop non-canonical pages until you verify the site is consistent.
- Watch for these failure modes: self-referencing canonicals missing, paginated pages all canonically pointing to page 1, faceted navigation collapsing to one URL, canonical tags generated differently on mobile vs desktop.
Use cases
- SEO debugging: figuring out why the page you expect is not the one Google seems to care about.
- Scraper deduplication: grouping multiple URL variants under one preferred page so you do not store the same thing five times.
- Crawl efficiency: cutting down wasted requests on parameter spam, sort orders, session IDs, and duplicate paths.
- Site migrations: checking whether canonicals and redirects agree after a domain, path, or CMS change.
- Marketplace and ecommerce monitoring: product URLs often show up with affiliate parameters, campaign tags, and internal tracking junk; canonical URLs help you collapse those variants into one record.