Webpages with almost similar content often increases headache for search engine crawlers. Because search engines can not just crawl and index everything whatever comes in their way. Pages with similar content are considered duplicate pages. Indexing duplicate content by the crawlers will unnecessarily wastes the storage capacity of Google servers. So, there must be a way to solve this problem. The concept of canonical url came in to solve this problem.
What is canonical url?
The canonical url means the preferred url over a bunch of urls with almost similar content or duplicate content. If we have multiple webpages with duplicate content, we can set a maximum of one url as canonical. This way we are telling the crawlers to index this specific url which has been set as canonical and ignore the rest.
Useful link,
How to set canonical url?
<link rel="canonical" href="https://example.com/" >
Put the above line of code into the head section of the webpage. The href="https://example.com/"
signifies the url which you want to set
as canonical. The rel=canonical
is a hint to search engine that consider this link as canonical. It is not a directive that search engine
crawlers must follow.
Should every page specify a canonical url?
No, Canonical urls are not mandatory. Not every page should specify a link tag mentioning canonical url. Only those pages which has content very similar to other pages, should contain a canonical tag. If a url is preferred over others set it as canonical otherwise alternate. If your website does not any pages with duplicate content you need not mention canonical tags.
When should I specify canonical url for a page?
1. If your website has pages accessed via both http and https protocol. In such cases your website has two different url for the same content. This creates duplicate content issues and you should specify a canonical url. Ideally, canonical url is not the solution for this kind of problem. You should do all the redirects in htaccess file in your server.
Example:
https://www.example.com
http://www.example.com
Both the url points to same content
2. If your website has pages accessed via both www and non-www domain. In such cases your website has two different url for the same content. This will again create duplicate content issues and you should specify a canonical url. Again, ideally canonical url is not the solution for this kind of problem. You should do all the redirects in htaccess file in your server.
Example:
example.com
www.example.com
Both the url points to same content.
3. Writing content in more than one language also creates duplicate content issue. Because url of the two pages might be different but their content is same. In such cases you should select one url as canonical.
Read more about,
Faqs about canonical and alternate link tags
Hreflang to optimize website for various language audiencesMeta tags for SEO- A beginners guide
Recommended read,
Can search engine crawlers ignore canonical tags?
Yes, absolutely. As it has been earlier mentioned canonical tags are kind of hints to search engine. It is not a directive which search engine must follow. There are special cases where even after declaring a url as canonical, it can be ignored by search engines crawlers and any other url will be selected as canonical. So, what factors make search engines ignore the canonical tags in head section?
Factors which make Googlebots ignore canonical tags.
Though webmasters can declare a certain url as canonical but Googlebots can certainly ignore these tags if it found that declared canonical url is not suitable to be indexed.
- Declared canonical url is slow compared to alternate duplicate urls.
- The canonical url is less often visited compared to alternate urls.
- Any other errors which is forcing googlebots to ignore canonical tags.
Points to remember while setting canonical tags
Below are some points which you should remember, if you are planning to use canonical tag. Improper configuration of canonical tag can produce absured results and harm SEO.
- Verify that page pointed by the
rel=canonical
doesn't contain a noindex robots meta tag. - Also verify that page pointed by
rel=canonical
are not blocked in the robots.txt file. - Do not specify more than one
rel=canonical
for a page. When more than one is specified, allrel=canonical
links will be ignored. - Do link to the canonical URL rather than a duplicate URL, when linking within your site. Linking consistently to the URL that you consider to be canonical helps Google understand your preference.
- Do specify a canonical page when using hreflang tags. Specify a canonical page in same language, or the best possible substitute language if a canonical doesn't exist for the same language.
- Don't use the URL removal tool for canonicalization: it removes all versions of a URL from search engine.
Published on March 27, 2021