01536 316100

rel=canonical tag

So, Google, Yahoo, Microsoft and, more recently, Ask have announced the new "canonical" link type or, more colloquially, the rel=canonical tag. Much has already been written about this tag and its purpose: to help prevent duplicate content issues. Probably the best summary is this Matt Cutts video

:

This tag is a welcome addition to the armoury in the fight against duplicate content issues. In addition to Matt's comments, I would make the following points:

Copyright Protection

Scrapers are forever copying content and publishing it on their own sites/splogs. Sometimes they are exceptionally lazy or stupid, even to the extent that they copy Adsense code onto their own sites. If they copy your rel=canonical tag onto their site, that would give a strong "hint" to the search engine that you were the original owner of the content:

 

Microsoft Platforms

Matt made reference to the Microsoft platform in his video, but I would emphasise the point. Microsoft's implementation of RFC 2396 is flawed. The path component of a URL is supposed to be case sensitive, but Microsoft makes it case insensitive. If there are n alphabetic characters in the path, then a Microsoft implementation gives 2n possible variations of that path, where there should be only one. For example, if n=1 and the path is "/a/". Microsoft would allow "/a/" and "/A/"; if n=2 and the path is "/ab/". Microsoft would allow "/ab/", "/aB", "/Ab" and "/AB/"; and so on. 2n variations gives vast potential for duplicate content and it is a big issue with sites built on the Microsoft platform. The rel=canonical tag makes it very easy to specify the correct, case-sensitive path on a Microsoft platform:

 

Static Web Content

Static web content is content that is stored in the format in which it is delivered. Typically, static content is served under a static URL (a URL that does not contain a question mark). However, it is possible to link to static content and append query parameters, even though these query parameters will have no impact on the content that is served. One example of when this might happen is when a referrer parameter is passed to a JavaScript function within the static content:

Affiliate Link

Thousands of links can be created to a single, static URL, each with a different referrer query parameter attached. For sites built on static content, trying to manage such links has been difficult in the past. Now, it's relatively easy. Each page of static content simply needs to contain a rel=canonical tag:

 

Conclusions: rel=canonical

For the reasons stated above, I would recommend the use of a rel=canonical tag in all static content. In fact, I would recommend its use in all content, static or dynamic - with appropriate care of course. It's a powerful tag and using it wrongly could have dire consequences. In the next post I'll look at some of the limitations of the rel=canonical tag and consider some alternatives.