How to overcome the duplicate content issue with canonical tags

Once upon a time the only way to get your site spidered by search engines was to manually submit it to the Yahoo directory (come on, in 1997 what other search engines were worth using). Nowadays, search engines are a lot more advanced and don’t require any manual work. This is great, but sometimes search engines can be a little too good at finding things, sometimes things you don’t want finding.

Take the following example… I run a website www.openmeetings.co.uk which I host on a subdomain of this website. I have changed the name servers of the subdomain so http://subdomain.simonlangley.co.uk/index.php  will be shown as http://www.openmeetings.co.uk/index.php as long you access through the site through the latter URL.

Now, whilst this is a cheap way of hosting multiple websites, the one drawback is that there are two versions of every page, but with different URLs. Anyone who knows anything about SEO will tell you that this bbbbaaaadddddd news. Search engines will penalise pages that appear to be duplicated, the reason for this is probably because the content is not original, will be of less use to users and/or is more likely to be spam related.

Up until now (for the above scenario anyhow), it was almost impossible to prevent search engines spidering the subdomain. One way to prevent spidering was to ensure that under no circumstances were links to the subdomain are used. This is fine for static HTML pages, but the second you start using PHP or other server-side scripting, the chances are subdomain links creep in (even for a short time) and as such are spidered.

At the start of 2009 a new method of overcoming this problem was developed and is (apparently) supported by all the major search engines. The method uses a ‘canonical tag’ which sits in the head of you document. The canonical tag tells search engines what the actual page should be, so if (as per the scenario above) a search engine stumbles across a page that differs from the URL in the canonical tag, it will be ignored.

Of course there are lots of other reasons why duplicate content can arise (campaign/tracking codes etc), which I won’t go into here, but this method will work for these too.

To implement a canonical tag place the following code in the head of the page:

<link rel="canonical" href="http://www.mysite.co.uk/index.htm"/>

So there you are!

I’ve heard a rumour that search engines use the canonical tag 95% of the time (who comes up with these stats is beyond me), so it’s not quite watertight, but certainly better than ye olde days.

Leave a comment

Name: (Required)

eMail: (Required)

Website:

Comment: