Paying for link building is bad for you business. If you have, then you've probably already attempted some type of
Penguin recovery process.
I was recently reviewing the "Links to Your Site" inside Google Webmaster Tools (WMT). You can find this report in your own Webmaster Tools account by clicking on Search Traffic, then Links to Your Site as shown in this image:
This is one of the reports you should be looking at when trying to clean up the ill effects from those previously paid link building programs. This Webmaster Tools report is far from inclusive, and you will need to purchase a link report from the company MajesticSEO as well.
The Webmaster Tools report will include "
dofollow links" and "nofollow links." According to Google's website guidelines, they will penalize you for dofollow links that are look unnatural and those coming from
link farms.
If Google penalized your website, you will have to figure out exactly which links they are penalizing you for. Unfortunately, the WMT report does not tell you which links are bad, it simply shows you the links they know about.
Google finds links by crawling the internet. This software is known sometimes as a "spider," but its actual name is "Googlebot." This so-called spider attempts to reread the internet every few days by 'crawling' it and trapping the best data for public consumption. This crawl spans ALL of the internet, meaning billions of individual web pages of every type.
Through the years, from what I've learned from tracking reports, Googlebot doesn't actually read every page of every website. Sometimes they read a page once and wait many months to return. It takes a lot of computer power to gather and process the information they gather, so I certainly do not blame them for taking their time spidering.
They also use another technical trick to ask your website if you've made changes to a page. They do this by reading the time stamp of your website files rather than reading the entire page and comparing it to what they've indexed in the past. In technical terms they are performing a 304 request on your pages.
During your Penguin cleanup process, you will be asking website owners to remove those bad links, but you will also have to disavow them. Although you will meet some website owners who are willing to help you by removing those links, there are many that will not.
As explained
here, you will have to disavow the websites that do not respond to, or simply deny your link removal request.
If Google took Manual Action against your site you will have to satisfy the manual auditors who are monitoring your removal work. Those people will look at the disavow link list you created, and the number of actual links that kind website owners were willing to remove for you.
The auditors at Google will remove the manual penalty if they are satisfied by what they see.
But it doesn't end there.
From what I can tell, the automated Penguin penalty is much harder to clean up than the manual action. That's because the automated Penguin process relies on the information gathering efforts of Googlebot. But while you are rushing around to save your website's ranking, Googlebot is leisurely gathering and regathering information from the internet using its time and resource saving methods.
It's quite possible that Googlebot will never revisit those pages where a few kind website owners removed the links, and therefore Penguin will never notice.
Referring back to the WMT link report I was recently reviewing, I found one website that was deleted, redesigned, rebuilt, and populated with brand new content more than 12 months ago. The 24 links that Google still marked on their report were deleted more than 12 months ago, yet they were still appearing on the report.
From a technical point of view, the report shows 24 pages with 1 link each. Googlebot is completely confused by this website and believes those 24 pages still exist.
Another domain on the same report showed 38 links, but that domain name expired more than 5 months ago and was replaced by a generic website placeholder page.
From a technical point of view, it looks like Googlebot recognizes this as an expired domain and therefore it's not wasting its time to read the site, leading it to believe that those 38 pages still exist.
There is definitely a schism between Googlebot and Penguin, and you might be the loser.
For those of you who have spammy links pointing from old sites that were deleted, changed, redesigned, or became defunct, the technical way in which a web server interacts with Googlebot could be hampering your Penguin recovery efforts. If Googlebot can't crawl the page in question, or if it doesn't bother trying, then Penguin seems to still believe that the link exists.
The bottom line is that it's important for a human auditor to see that you have removed links if you are trying to remove a manual penalty. They will check your work. But if you are fighting the automated Penguin, it might be in your best interest to simply disavow all the links that you suspect are bad. Even if you manually work through the process, it could still take well over a year for Penguin to realize the links are removed.