Last month, Matt Cutts announced a resolution to the canonical url issues facing many web sites today, which involves placing a link element within the head tags of a web page. Before I get into this fix, let’s examine this further.
Canonical Linking Defined
Canonical, as strange a word as it is, is defined as “Conforming to orthodox or well-established rules or patterns, as of procedure”. For the web it simply means conforming to a standardized way of showing or linking to a url or web link.
The Problem with Canonical & Duplicate Content
Using techwyse.com as an example, the following are considered the same:
• techwyse.com
• www.techwyse.com
• techwyse.com/index.php
• www.techwyse.com/index.php
Technically all 4 of these pages are the same, but because of how the urls look different, search engines may see these 4 links as being different pages, which can result in duplicate content.
If you navigate to any of the 4 above links you will see in the url bar that they all get redirected to www.techwyse.com, so they resolve correctly for us, using 301 redirects.
Solving Canonical & Duplicate Content
The best way to solve these types of issues are from the very beginning of site design, where developers are using techniques for SEO best practices to ensure you are not having to fix this later on. In my many demos and instruction to our in-house web developers & programmers, we have built processes and education to ensure the web properties we build conform accordingly.
Sample Fixes For Duplicate Content Scenarios
- Modify a CMS (Content Management System) to generate the appropriate urls.
CMS systems come in all types, so how well this works will vary greatly.
- Pick one of the 4 Canonical Domains & make sure you link consistently within your web site.
An example is a home page link, choose either domain.com or www.domain.com and use it consistently throughout your web site.
- Use a 301 Redirect to forward to the preferred Canonical Domain.
If you have decided to use www.domain.com then your 301 Redirect should forward all domain.com requests to this one.
- Google’s Webmaster Tools has a setting to specify either www or non-www
- Ensure your sitemap file is using your preference of either www or non-www, as Google will use those urls as a deciding factor
Difficult Fixes For Duplicate Content Scenarios
- Cannot generate Permanent 30 Redirects.
In some situations, depending on your host’s server configuration, you may not be able to perform 301’s
- Inbound links.
Obviously, you cannot control how people link to your web site, so this can become problematic. You can try contacting the site owners to see if they can modify the link.
- Uppercase / Lowercase page names.
Apache will generally show correctly, a 404 error if an uppercase url is requested, when the url is lowercase, whereas Microsoft’s IIS will show the page, regardless
- Session IDs.
Largely used for shopping carts, web sites that require a login or even a site that is using some kind of http visitor tracking will use Session IDs. The problem here is that whenever a visitor or search bot for that matter visits a page, the Session ID will change, meaning the page name changes as well.
- Tracking codes, web analytics & landing pages.
Very similar to the Session IDs
- Sorting by ascending and descending
Shopping cart web sites tend to have this option available, and even though the web page stays the same, content-wise, the url normally changes.
The New Canonical Link Element Explained
As I mentioned above, Matt Cutts announced a new canonical link element that is or will be supported by Google, Bing, Yahoo & Ask. The basic premise is that you can specify which url you would like shown and indexed in the Search Results Pages.
The canonical link tag looks like this
<link rel="canonical" href="http://domain.com/product.php?id=red-shoes" />
So, for example, if you have a crazy looking url, such as: http://domain.com/product.php?id=red-shoes?sid=569239547274&order=asc, you can specify in the canonical link tag your preferred version which in this example would be http://domain.com/product.php?id=red-shoes
Read More about the Canonical Link Element
And here is a fairly long video from Matt Cutts describing this new tag. A lot of Geek talk in there, so if you feel you are not technical enough you may want to forward this to your web developer or ofcourse you can contact our Internet Marketing team.
on
Hello C-note, Thanks for the great article. This is all a little confusing actually. Matt Cutts’ explanation had cleared the fog a bit.
Lots of this seo stuff can be confusing and there are so many little things to know about to do it right.
Please keep the good info coming. It is greatly appreciated.
on
It was interesting to hear Matt Cutts interviewed on this. One of the ways Google deals with duplicate content is by use of the sitemap.xml, the XML version that we upload. It’s definitely worth doing properly simply because it also allows us to set the prominence of the pages , using a sliding scale.