Last month, Google rolled out a change in its ranking algorithm that has come to be known as the “Farmer Update” — a reference to the so-called “content farms” that Google has put in the crosshairs. Not surprisingly, the algorithm change was more than a little controversial. Many observers pointed out collateral damage supposedly done to more legitimate sites, while other critics charged Google with allegedly singling out individual domains.
While I think the latter is a conspiracy theory, I do have access to the analytics of a number of large sites (200K to 1.5M pageviews per month), and the rankings drop in the sites that were hit hardest did seem to drop unilaterally, while the sites with minimal damage seemed to lose rankings on a page-by-page basis. How can you protect your site from being flagged as a content farm, regardless of whether or not it is? I don’t pretend to know Google’s algorithm, but I can offer some educated guesses.
What Is a Content Farm?
Not everyone agrees on what a content farm is exactly, but the common denominator is a site filled with low quality articles ostensibly designed to capture search traffic on a particular set of keywords, then monetize them with advertising or lead generation. Some of these sites are “scraper sites” that have no original content, but simply copy and paste articles published elsewhere (often without attribution) and plaster their pages with Google AdSense ads. Personally, I would call these “splogs” rather than content farms.
What I would call a content farm is a site that produces original content—hundreds or thousands of articles per month—with each article expressly commissioned to target a specific keyword. “Original” is relative, since many of these articles are marginal rewordings of more authoritative articles on the web. They’re not low quality by design, but the mandate to produce enormous amounts of content each day pretty much guarantees that most articles will be the equivalent of widgets on a conveyor belt. The poster child for content farms is Demand Media’s eHow. Ironically, eHow escaped making the Top 25 Farmer Update Losers list compiled by Sistrix.
In a content farm, full time keyword researchers create extensive lists of commercially valuable keywords couched in titles (“acne treatment” becomes “Acne Treatment at Home”), and each line item is a content order (article) that gets outsourced to a freelance writer for a few dollars per article, or a full time writer tasked to write dozens of such articles per month. A fairly thorough overview of the workflow of a large scale content farm can be seen in the “AOL Way” document leaked a few weeks ago.
But I Don’t Have a Content Farm!
Unfortunately, I don’t know anyone whose traffic and rankings didn’t go down after Google flipped the switch on the new algorithm. Even people who don’t (or claim to not) use SEO to promote their sites have taken massive hits. I’ve seen the update impact sites with fewer than 10 pages, so for all intents and purposes, a content farm is whatever Google treats as one.
Unless you’re paying for traffic, or getting it from an email list (yours or a joint partner’s), then you need to protect your search traffic. How can you do this when you Google operates in such mysterious ways? Here are a few things you can do.
1. Consolidate content.
If your site has half a dozen articles on running shoes, find the one that’s the top performer, and start migrating content from the other pages to the “tentpole” article; then redirect the old pages. You could look at your analytics for highest pageviews, but since pageviews are often correlated to search rankings, it’s better to look for the highest ranked article in Google by running a site search in the form of “site:mysite.com [keyword]“.
So for the running shoes example, if your site was ShoeWorld.com, you would do a search on “site:shoeworld.com ‘running shoes’”, and pick the page that ranks at the top—as long as it’s a logical tentpole article. In other words, you want a title that’s generic enough to rank for different searches. “5 Features to Quality Running Shoes” is better than “How to Clean Your Running Shoes”.
Regarding the suggestion to redirect the old pages: you’ll need to set up 301 redirects to what will become the tentpole page before you start moving content from the old pages; otherwise those pages risk being flagged as duplicate content—i.e. you have the same material on two pages.
2. Build link nets.
A less labor-intensive alternative to consolidating pages is to build link nets. You still identify the top ranking or top performing page for each keyword you’re targeting; but instead of moving content from other related pages, you edit each of those pages to include a link to the tentpole (top ranking) page. If your rankings took a dive, but you’re confident in the quality of your content, then link nets are a better approach than content consolidation, since they don’t lower your site’s index count (the number of pages the site has in Google’s index).
So if you had a well ranking page on “5 Features of Quality Running Shoes”, you would link to it from your “How to Clean Your Running Shoes” page, as well as from every other page that ranks for “running shoes”. If you have more than two or three pages that would work for support articles, consider varying the anchor text with “cousin” keywords—for instance, you could link to the tentpole from two pages using the primary keyword (“running shoes”), another page with one closely related keyword (“best running shoes”), and another page with another closely related keyword (“running shoes online”). All you need to do is put your main keyword in the Google Keyword Tool, which is sorted by relevance by default, and pick some of the keywords near the top of the list. This kind of internal linking tells Google what the main page is about more reliably than hoping other bloggers will link to the page with the most appropriate anchor text (as opposed to something like “this article”).
3. Check for duplicate content.
First, let’s understand what duplicate content is and isn’t. Google only wants to index unique content so that no two listings in search results are completely identical. While they don’t always success at avoiding redundant listings, they try to avoid indexing results with the same text to the fullest extent possible. What people often call a “duplicate content penalty” is less about punishing plagiarism than maintaining unique listings. I once did SEO work on a very large site where one of their subdomains wasn’t being indexed because its XML sitemap was inadvertently copied the one in the top level domain. It’s very easy to run into duplicate content problems, even when you’re fastidious about avoiding plagiarism.
Do a Copyscape check on all of your published articles. If Copyscape flags something as a dupe, compare it with the alleged source and see if you agree. Copyscape tends to be overly agressive in identifying things as duplicate, so you can reasonably assume that anything that passes Copyscape will be kosher with Google; but you can’t assume that what Copyscape identifies as duplicate content in actually duplicate unless you do a human review. You might be able to get the offending article to pass Copyscape after one or two minor edits, but obviously, if the article really is plagiarized, it should be scrapped. I’d recommend using a 301 redirect rather than actually deleting the article, since Google frowns on sites that delete their pages.
Do You Have a Content Farm?
Let’s face it: some of us are actually running content farms, and are simply in denial about it. Be honest with yourself. Don’t assume that a drop in rankings is collateral damage. If you’re hosting a bunch of user generated content (forums, article aggregators) or publishing RSS feeds from external sites, you may want to rethink and future proof your approach. I recently shut down a site whose only content came from user submissions, since it wasn’t worth the management overhead to make sure that everything was above board. It’s better in the long run to only work with content that’s 100% under your control.