With Google turning the Panda crank a little more every month (or week, as it seems as of late) duplicate content is becoming more and more of an issue with “duplicate content penalties” being handed out at an alarming rate.
I’m seeing sites penalized regularly across numerous different industries and in many cases these duplicate content penalties could have been avoided. So I want to make sure you know how to identify duplicate content and ensure you’re not unknowingly breaking any of Google’s less than perfectly clear rules…
So first of all what is ‘duplicate content’?
The most obvious example is when you copy content verbatim from a site already indexed by Google and publish it on your own site, but it’s these less obvious reasons that are really tripping people up:
Problem: Multiple landing pages with similar content.
Whether it’s for split testing or slight copy variations for different traffic sources, many sites have multiple landing pages with very similar content and although this is just smart marketing, Google will see this as duplicate content.
Solution: Use the in the tag of all of the duplicate landing pages. Google will then disregard them and you will not be penalized.
Now the downside of the solution above is you can lose link juice if there are numerous back links pointing towards these different landing pages. If this is the case then you need to use the “rel=canonical” tag. But I should warn you, before you start throwing canonical tags onto your pages you need to make sure you REALLY understand what you are doing or you can screw things up pretty fast.
Here is a video on the “rel=canonical” tag that Google made…
2. Problem: Different URL parameters & Session ID’s.
Different analytics tools and click tracking programs will add different parameters to the end of your URL. For example
The (?ad=ppc) is a parameter with ‘ppc’ being the tracking ID to identify the traffic source and although it will change, everyone still lands on the same page.
Unfortunately Google may see each individual tracking ID as a unique page causing it to penalize your website for duplicate content even though there really is only one page.
IMPORANT: This is a common problem I’m seeing across sites that have affiliate programs. Certain programs create a unique parameter that Google will then see as a unique page.
Solution: Use the Google Webmaster Tools Parameter Handling feature to tell Google which parameters to ignore.
3. Problem: Other websites copying YOUR content.
The theory is that if you publish unique content and Google indexes it on your website first you will not get penalized if someone else copies it… but this is only the theory!
If your website is new and a more established website copies your content, you may be the one that ends up getting the penalty because Google ‘trusts’ the more established site.
Lately I’ve also come across sites that have had their content copied so many times and published on questionable sites that Google penalized all of the sites. This can be a real issue if you have a large affiliate program with affiliates grabbing content from your site on a regular basis.
Solution: Now the only way to really combat this is to monitor the web for other sites that are stealing your stuff and I personally use Copyscape.com to do this. Or if you want to do a quick spot check, just copy any unique string of content from your web page and paste it into the search box on Google and see what comes back.
And if you find someone actually stealing your content, your options are really limited. But the first thing I would do is contact them directly and ask them to remove the content and then escalate to a cease and desist order if need be.
On the other hand, if I have an established website and the site stealing my content does not have much authority, I would not worry too much about it.
4. Problem: WWW or No WWW… Which is it?
If you type in ‘www.yourdomain.com’ or just ‘yourdomain.com’ it all probably goes to the same page right? Well unfortunately Google sees these as two separate pages and that will cause a myriad of issues with link juice and duplicate content.
To fix this go to Google Webmaster Tools and set your “Preferred Domain” to be one or the other. This will eliminate the problem entirely!
And few final tips…
When you are linking to pages within your own site, make sure you use consistent URL’s. In other words, if the first link to my blog looks like:
Make sure the next link is not:
Google can confuse things enough without our help ????
Also if you have “Printer Friendly” versions of your pages, make sure they include – a simple but common mistake.
Last but not least, you are probably wondering how much duplicate content is okay?
Well I wish I had the exact answer to that question, but other than the engineers at Google, no one really knows. But my general rule of thumb is to shoot for 60% unique content. This may be a little more conservative than some of the other guesstimates out there, but I would suggest erring on the side of caution!
Hope this helps and if you have any thoughts, comments, or different opinions I would love to hear them in the comments section below…
To your continued ‘Google Domination’…