Posted Friday, October 22, 2004
While all search engines use one form of caching or another to build their indices, some of them make a point of displaying cached web pages to their users. The commonly quoted pretext for this is that it offers searchers fast access to a page's content, making it easier to check out whether it's what they are really looking for in the first place. Of course, what this actually does is keep visitors on the search engine's site, making them more susceptible to banner ads and other means of promotion.
However, the drawbacks this entails are numerous. - Depending on the search engine's index cycle the content presented may be quite outdated. - More often than not, the presented pages will not be fully functional:
= site design and layout may be massacred by incorrect or non-existent display of external Cascading Style Sheets (CSS)
= banner ads may not be displayed properly, thus depriving webmasters of revenue = dynamic content may not be rendered the way it was originally set up.
- Displaying content within an alien context (e.g. under the search engine's header, encased in a frame, etc.) beyond the control of said content's generators/authors, arguably constitutes a blatant infringement of intellectual property and copyrights.
Moreover, for a web site employing IP delivery, this practice constitutes a prime Decloaking Hazard: as cloaking works by feeding an optimized (or, at least, different) page to search engine spiders not intended for human perusal, caching such pages and displaying them for the asking will reveal your cloaking effort, this rendering it useless - any unscrupulous competitor could easily steal your cloaked code to optimize their own pages with it and achieve better rankings to your detriment.
The most prominent search engine displaying cached web pages not of their own making is, of course, Google. In the past Google staff would promptly comply with any request by webmasters not to display cached pages. Then, about a year and some ago, Google introduced a proprietary meta tag (META NAME="GOOGLEBOT" CONTENT="NOARCHIVE") for webmasters to include in the header of those pages they want to see excluded from this feature.
The Google meta tag actually works. While there was some indication immediately after their introduction that sites opting for this exclusion might be penalized ranking wise, this seems to have abated. Obviously, should Google really start a witch hunt on cloaking sites, as their public announcements are font of stating every other month or so, it only stands to reason that web sites making use of this special meta tag might constitute prime targets. For this reason we do not recommend cloaking for Google unless you do it exclusively from a dedicated shadow domain.
Another company, Germany based brainbot technologies AG offers search engine technology for portals: < (http://brainbot.com/) >
Brainbot robots are also spidering international domains: #UA gigabaz/3.14 (firstname.lastname@example.org; (http://gigabaz.com/gigabaz/))
#UA gigaBazV11.3 bazbrainbot.com; (http://brainbot.com/gigabaz/)
One licensee making use of their cached results is geekbot: < (http://www.geekbot.org) > On their result pages you will find a "scan" function - this will display cached pages, albeit in a different format.
French search engine AntiSearch offers display of cached web pages, too: < (http://www.antisearch.net/) >
AntiSearch operates the following spiders: #UA antibot-V1.1/i586-linux-2.2 184.108.40.206
#UA antibot-V1.1/i586-linux-2.2 220.127.116.11
#UA antibot-V1.1/i586-linux-2.2 18.104.22.168
#UA antibot-V1.1/i586-linux-2.2 22.214.171.124
#UA antibot-V1.1/i586-linux-2.2 126.96.36.199
Finally, let's not forget German search engine Speedfind:
< (http://www.speedfind.de) > Speedfind, too, offers display of cached pages.
Due to the peculiar legal situation in Germany, which makes webmaster fully liable for links to third party pages unless they post an explicit disclaimer prominently on their site, Speedfind refuses all liability for the pages thus displayed:
"SPEEDFIND DOCUMENT FROM CACHE VIEWER SPEEDFIND is in no way liable for content displayed below.
All rights belong to the respective page's author. We are only displaying a copy of said page." (Translated from German)
So while they do acknowledge authors' full rights, same authors' permission for display of copyrighted content is never requested - there is no indication in their terms of submission how to prevent page caching.
Speedfind operates the following spiders:
#UA visual ramBot xtreme 7.0 proxy-gate.oberland.net
#UA speedfind ramBot xtreme 8.1 new.speedfind.de
#UA speedfind ramBot xtreme 8.1 eins.speedfind.de
#UA visual ramBot xtreme 7.0 c2.oberland.net
#UA visual ramBot xtreme 7.0 io.oberland.net
Rather than bother with minor players like Speedfind, AntiSearch and brainbot by excluding them from your submission process, you may want to consider blocking their spiders from access to your web site altogether (lest your competitors should submit your site behind your back!).
In this case, we would recommend using our fantomas multiBlocker(TM) for a professional blocker solution:
< (http://fantomaster.com/famultiblocker0.html) >
About the Author
Ralph Tegtmeier and Dirk Brockhausen are the co-founders and principals of fantomaster.com Ltd. (UK) and fantomaster.com GmbH (Belgium), < (http://fantomaster.com/) > a company specializing in webmasters software development, industrial-strength cloaking and search engine positioning services. You can contact them at mailto:email@example.com