Search Engine Robots - How They Work, What They Do (Part II)
By Daria Goetsch
Posted Tuesday, June 29, 2004
Why Isn't My Website In The Search Engine?
If your site isn't found in the search engines, it is probably because the
robots couldn't deal with it. It could be something as simple as not being able
to find the site, or it may be more complicated issues involving the robot's not
being able to crawl the site or figure out what your pages are all about.
Submitting your site to the major search engines: that will help deal with the
"can't find it" problem. Even having links pointing back to your site can be
enough to attract the search engine robots. Google, for example, suggests that
you may not have to submit your pages; they will find your site if you have a
link pointing back to it from at least one other site on the web.
If the robots can find your site but can't make sense of it, then you may need
to look at the content and technology used on your pages. Frames, Flash,
dynamically generated pages, and invalid HTML source code can cause problems
when the search engine robot tries to access your web pages. While some search
engines are beginning to be able to index dynamically generated pages and Flash
(e.g. Google and AllTheWeb), use of some of these technologies can hinder your
ability to be indexed by the search engine robots.
Text in images cannot be read by the search engine robots. Using ALT image text
is an important way to help the robots "read" your images. Websites with
extensive images rely heavily on ALT text to present their content.
How Do I Get The Most Out Of Indexing?
If you know what to "feed" the spidering robots you will help yourself with
search engine ranking.
Having a website full of good content is the major factor. Search engines exist
to serve their visitors, not to rank your website. You need to be sure to
present yourself in your site in the way that will be most useful to the search
engine visitor. Each search engine has its own idea of what is important in a
page, but they all value text highly. Making sure that the text on your pages
includes your most important keyword phrases will help the search engine
evaluate the content of those pages.
Making sure that you have good title and meta tags will further assist the
search engines in understanding what your page is about. If the text on the page
is about widgets, the title is about widgets, and the meta tags are about
widgets, the search engine will have a pretty good idea that you are all about
widgets. When their visitors search for widgets, the search engines know to list
your site in the results.
A sitemap page is a very good way of giving the search engine robot every
opportunity to reach your website pages. Since robots click through the links of
your web pages, make sure that at least your most important pages are included
in the sitemap; you may even want to include all your pages there, depending on
the size of your site. Be sure to add a link to the sitemap page from each page
on your site.
Another important consideration is that of keeping all of your pages within a
small number of "clicks" from your top page. Many robots will not follow links
more than two or three levels deep, so if your "widgets" page can only be
reached from your home page by following multiple links (e.g. home page >> about
us page >> products page >> widgets page), the robot may not crawl deep enough
to get to the widgets page.
Testing Your Website For Search Engine Robot Accessibility
To get an idea just what the search engine robot "sees" on your page, you can
look at the Sim Spider tool. You may be surprised at how different your site
looks to the robot. You can find this tool at
(http://www.searchengineworld.com/cgi-bin/sim_spider.cgi)
You will see text and ALT image text show up in the results. If your entire
website is built in Flash, you will see nothing at all because robots don't
understand Flash movies.
The Bottom Line
When it comes to search engine robots, think simply. Lots of good content and
text, hyperlinks the robots can follow, optimization of your pages, topical
links pointing back to your site and a sitemap will help insure the best results
when the robots come visiting.
Resources
SpiderSpotting - Search Engine Watch
(http://searchenginewatch.com/webmasters/spiders.html)
Robotstxt.org
List of robots and protocols for setting up a robots.txt file.
(http://www.robotstxt.org/)
Spider-Food
Tutorials, forums and articles about Search Engine spiders and Search Engine
Marketing.
(http://spider-food.net/)
Spiderhunter.com
Articles and resources about tracking Search Engine spiders.
(http://www.spiderhunter.com/)
Sim Spider Search Engine Robot Simulator
Search Engine World has a spider that simulates what the Search Engine robots
read from your website.
(http://www.searchengineworld.com/cgi-bin/sim_spider.cgi)
About the Author
Daria Goetsch is the founder and Search Engine Marketing Consultant for Search
Innovation Marketing (www.searchinnovation.com), a Search Engine Promotion
company serving small businesses. She has
specialized in search engine optimization since 1998, including three years as
the Search Engine Specialist for O'Reilly & Associates, a technical book
publishing company.