Posted Sunday, September 19, 2004
Please note: at any given time Google may turn this new filter off and revert back to its old algorithm. I do not see this current algorithm as something that will stand the test of time. I do believe it is going to be the makeshift algorithm until Google can introduce some of the web search personalization and clustering technology that it obtained when Google purchased Kaltix. I am not a Google engineer and my of my statements in this document may eventually prove false. This is Google as I know it to be in my mind.
Cat and Mouse
Recently Google has had another "dance". With this most recent change they did more than the usual PageRank update. For years search engine optimization firms and spammers have chased after the search engines. Each time the search engine would introduce a new feature thousands of people would look for ways to exploit it. Perhaps Google has the solution.
Prior to the web, a good measure of the quality of a research paper was how many other research papers referred to it. Google brought this idea to the web. Google PageRank organizes the web based on an empirical link analysis of the entire web. The original document on PageRank, titled PageRank Citation Ranking: Bringing Order to the Web has been cited in hundreds of other documents.
Problems with PageRank
By linking to a website, the owner is casting a vote for the other site. The problem with this measure of relevancy is that a vote is not always a vote, and people can vote off topic. People used free for all link exchanges to artificially enhance their rankings, others signed guest books. Both of these methods faded in recent times as search engines grew wise.
Some web pages are naturally link heavy. Weblogs for example, have a large number of incoming and outgoing links. Many weblog software programs allow users to archive each post as its own page. If a small group of popular writers reference each other suddenly multi thousand page networks arise.
Articles dating back over a year have claimed that Google PageRank is dead. Just like the ugly spam that fills your inbox everyday, people want to get something for nothing from search engines.
Recent PageRank and Ranking Manipulation
Some of the common techniques for improving site relevancy (and degrading search results) were abusive reciprocal linking - two sites linking together exclusively for the sake of ranking improvements - common to sites selling viagra comment spam - people (or software) post comments on weblogs and point them at their website with optimized links selling of pagerank - people can sell PageRank to a completely unrelated site. in fact there are entire networks which have this as a business model (http://www.pradnetwork.com/) While Google has fought hard to keep its relevancy, a drastic change was necessary. In the past search engines rated web pages on inbound links, keyword density, and keyword proximity.
Optimized inbound links would be links which have your exact key phrase in them. Keyword density would be the number of times a keyword appears on the page (and how they appear) divided by the total number of words. Keyword proximity is how close keywords appear to one another on the page.
The rule to high rankings was to simply place the phrases you wanted to list well for in your page copy multiple times, in your title, and in your inbound links.
The New Filter
This latest algorithmic update measures keyword proximity and other factors of artificial rank boosting to eliminates pages which trip the filter. It is not certain what factors are all considered in this filter, but it is believed that a high ratio of overly optimized text links coupled with high keyword density and close keyword proximity will trip the filter.
If you place a - between the keywords you are searching for Google will act as if the entire phrase is just one word. This is most easily observed by using the page highlighting feature on the Google toolbar.
If you search for "search-engine-marketing" you will see my site (Search Marketing Info) lists ~ 7. If you take the dashes out, you will see that my site tripped the filter and will not be listed in the search results for terms such as "search marketing" or "search engine marketing." The dashes make it seem as if it is all one word to Google. That is why the keyword trip does not occur. If yo pull the dashes out though - I go over the limits and trip the filter.
Please note: Google recently fixed the - sign between words loophole. For a while searching for keywordA keywordB -blabla will still work. Google will eventually fix this too though.
If your site goes over the trip filter it will not cause your site to be assessed any spam penalties. Simply put, your site just will not show for the specific search you trip the filter on.
Only the specific page is penalized for the specific search. Your PageRank and all other pages in the site go unchanged.
The Test Website
note: a, b, and c refer to different keywords
One of my clients had a website that was listing in the top ten for phrase A B C. he also was in the top ten for A B and B C. When Google performed this update he was removed from all the various top rankings. His site was the perfect test site to discover the limits of this filter. His only inbound link was a singe PR6 link from DMOZ (the Open Directory Project) with none of the keywords in the link.
I, being interested in getting my clients site back on top again began to test. In some areas I combined B C into BC. In other areas I removed and/or distributed the keywords differently. Sure enough Google came and quickly re indexed his website. I then searched for A B and B C. He was quickly listing in the top 5 for both of these competitive phrases. I searched for A B C, and he was still over the filter limit. This all but confirmed in my mind the key phrase filter idea.
I changed his website again and am anxiously anticipating a Google re index of the page to verify we recapture his key phrase just below the trip limit!
How to Bypass the Filter
The filter aims to catch highly optimized pages. The way to bypass it then is to not appear highly optimized. When Google searches for A B C, it wants to find A B C maybe a few times, but it also wants to see A B, B C, A, B, and C sprinkled throughout the text where possible.
How the Filter Hurts Google
Right now the filter does appear to be introducing more spam in on certain searches. The overall relevancy of most Google searches is still rather high though. Where I see a true problem with this new algorithm is that in a few months if cloaking software professionals adjust to this new script they will be able to tear holes in Google.
The problem with the current algorithm is that it is looking to match randomized, un optimized page and article structure. People can mix their words up some, but you frequently end up using some words together naturally. A cloaking script does not have to make sense to a reader. A cloaking script can write text which is more randomly organized than you or I because it only has to follow mathematical breakdowns. It does not also have to read well to a human eye.
In addition to this fact, many of the highly optimized sites that were appearing at the top are no longer there to protect the searcher from the real heavy spam which may (and in some cases already has) risen.
We shall see if they soon add better spam filters (which will surely be required once cloakers adjust to the new algorithm).
How the Filter Helps Google
This filter helps Google in two main ways.
Manipulate search results is much harder for a novice SEO or webmaster. Selling PageRank to an off topic site may actually cause the site to trip the spam limits, thus PageRank is no longer as easy to abuse. Many highly optimized sites recently tripped the filter and lost their distribution. Might these people be buying AdWords for Christmas? ho ho ho!
What Google is Looking For
The general principal behind the filter is the idea that non commercial sites are not usually over optimized. Many commercial sites generally are. When Google serves up sites, they are looking to deliver the most relevant, not the most highly optimized websites.
The Eyes of a Webmaster
Our eyes are our own source of filtration. When we see our sites disappear for some of the best keywords we may get angry. Sometimes the results look bad while they are playing with the filters. What truly matters to Google is not if the sites are optimized, but if the results are relevant. Relevant results to their test subjects and Joe average web surfer is all that matters. Like it or not, if spam does not sneak in, this algorithm change may actually help Google.
Why Change Now
While some of the commercial searches have been degraded, often relevant results have filled the places once occupied by highly optimized websites. The biggest change is that these optimized websites have lost distribution right before the holiday spending season - ouch.
If you look closely at the Google search engine results pages now you will see that the Google AdWords boxes have grown a bit. Signs of things to come? Does Google want commercial search listings to only appear in ads? Now that Google has so many websites dependant upon it, they can change the face of the internet overnight, and people have to play by Google's rules. So as the IPO is nearing, Google may be looking for some cash to show solid results which will improve opening stock price.
Yahoo has purchased Inktomi and has been branding Yahoo search heavily (although right now it is still powered by Google). Ideally if Google search results degrade Yahoo will switch to Inktomi and steal market share back from Google. It is sure to be a dog fight.
The change to the Google algorithm sections of this article will be updated as I learn more, and I will provide updates on my blog.
- by Aaron Wall of Search Marketing Info
About the Author