Understanding Search Engine Robots
By David Bell
Posted Monday, June 28, 2004
If there is one thing I have learned about robots, it is that there is absolutely no pattern to them. Most robots are stupid and wander randomly. For example, 50% of robot hits to my sites, ask for the robots.txt page and then go away never asking for anything else. Then they come back a week later, ask for the same thing and then go away, again. This happens over and over again for months. I have never figured it out. What are they doing? If they wanted to see if the website was really a web site, they could just Ping it. This would be much faster and much more efficient. They seldom visit another page and if they do, they ask for one other page every visit or so. Some come in and issue rapid-fire requests for every page in the website. How rude! You have to quit worrying so much about robots. It takes 6 months before they request enough pages to do you any good. I really quit thinking about them a long time ago. Build a lot of pages correctly and, if you have reciprocal links to them, the robots will find them someday.
Try this: Go to AltaVista and type into the search box link:YourSite.com (Leave off the www). This will list the reciprocal links to your web site. Try link:crownjewels.com and you get 136 links to it. Think about this now: The robots say to themselves, "Here is a site that must be popular or why would so many websites SIMILAR to it have it's link on their pages?" Remember that only SIMILAR sites with SIMILAR THEMES would probably have a link to your site. They give more importance to this than you submitting your link to them. Wouldn't you?
Go to heavily trafficked sites matching your web site's Themes and use AltaVista to find out how many reciprocal links they have. This will prove to you I am right.
Search engines are nothing more than reciprocal links to your site. The problem is, you are constantly having to fight for your positioning in the search query listings. Forget about that. Leave the fighting to people who are able to spend 24 hours a day trying to trick everybody. Quit trying to compete with the large organizations pouring millions into their marketing. Completely forget about Search Engines after submitting to them and go after the reciprocal links. The Search Engines will then believe you are a heavily visited site because you will be. You will now be getting the traffic you so richly deserve.
Search engine visitors to your site, are often-times not qualified visitors. Too many visitors pop into your home page for 2 seconds and then leave. You know how it is. We all do it when we are using the search engines. Either it wasn't the information we were looking for, or they had this huge graphic on this stupid portal page, which just took forever to load. These visitors shouldn't even count, but they get counted as 12-18 hits in your server logs. Hits are requests to the server. One page request can incur a lot of hits: requests to the page itself plus the graphics, each count as a hit.
Reciprocal links bring in qualified visitors. These are visitors who were already on a web site which had matching Themes to yours. They already have a good idea of what type of site you are. They will come into your site and actually stay awhile. These visitors should count as double credit, they are so good.
I know which type of visitor I would rather have.
How do you get people to WANT to put your link on their web sites? Why would a similar site put a link to your site on theirs? Simple, you have similar Themes. You are similar, but not competition.
There is one very important lesson to be learned from this crazy robot behavior. You need to make the navigation in your web site so easy that a visitor can find any page within 2 clicks of your home page. One way of doing this is installing hidden DotLinks [Dotlinks are little periods that are linked to other pages which are not really noticeable on your page if you put it as a period. Although, they are not easily seen by the human eye, they are a link that a robot can follow] in your web site. When you do this, robots can find your pages faster and more easily.
Giving The Robots What They Want.
So how do you make the search engine robots give your site a better rating than all the other millions of websites trying to do the same thing? Simple, give them what they want. You can't trick them or make them think that you are better than you are. Think about a visit from the eyes of a robot. He finds a site, usually from links embedded in web pages, then loads the text from the first page.
He looks for the META tags and pulls out the keywords and description. If not there he takes the first 200 or so characters of text and uses them as a description.
The Title is extracted.
He extracts the pure text from the page (strips out the HTML coding). He takes out the common words leaving what he feels may be keywords. (Most do not do this last step.)
He now extracts the hyperlinks collating them into those that belong to this website and those that don't (He visits these later as this is how he finds new websites).
He may do the same with the email addresses.
He goes on to the next page and so on until he has visited all of the pages in your web site.
Now they store all of this information.
He now knows how many pages you have, how many 'outside hyperlinks in your site', and can give your site a score based on how it is set up. These are the basics.
What do they do with the info? When someone comes to search a phrase or keyword, another search routine program takes over using the information the robot found. A person types in the keywords and the search program returns the 256,000 pages matching their keywords. BUT they also consider the following: How old is the website or how long has the engine known about it? How large is the website? Was it properly constructed? How many hyperlinks are there to outside websites?
VERY IMPORTANT! How many hyperlinks are located on other websites to this site. The older and better the website the more links to it.
These robots know when you are cheating. You can't trick them. It is so simple for the robot developer to incorporate code to negate the tricks.
What about scoring keywords only once or twice per page or area like meta, title, etc?
Is this page close in size to all the other portal pages?
How many web pages in the same directory have the word "index" in them?
Does this site have a lot of content?
Are there links to outside sites?
Each page can be checked and compared against what the robot feels is a statistically normal page. These are computers you know.
You need a lot of pages with normal content. Instead of spending the time to make fake pages, give the real ones content. This will also give your visitors something to come back to. CONTENT
I hope this helps in your future marketing decisions.
About the Author
David Bell is Manager, Online Marketing, at (http://www.wspromotion.com/) , a leading Search Engine Optimization services firm and Advertising Agency.