Posted Monday, June 28, 2004
What Are Search Engines?
Most of us often face the problem of searching the web. Nowadays, the global network is one of the most important sources of information there is, its main goal being to make information easily accessible. That's where the main problem arises: how to find what you need among all those innumerable terabytes of data. The World Wide Web is overloaded with various stuff related to diverse interests and activities of human beings who inhabit the globe. How can you tell what a site is devoted to without visiting it? Besides, the number of resources grew as quickly as the Internet’s own development, and many of them closely resembled each other (and still do). This situation necessitated finding a reliable (and at the same time fast) way to simplify the search process, otherwise there would be absolutely no point to the World Wide Web. So, development and deployment of the first search engines closely followed the birth of the World Wide Web. *
How It All Began
At the start, search engines developed quite rapidly. The "grandfather" of all modern search engines was Archie, launched in 1990, the creation of Alan Emtage, a student at McGill University, Montreal. Three years later, the University of Nevada System Computing Services deployed Veronica. These search engines created databases and collected information on the files existing in the global network. But they were soon overwhelmed by the fast growth of the net, and others stepped forward.
World Wide Web Wanderer was the first automated Internet robot, whereas ALIWEB, launched in Autumn of 1993, was the first rough model of a modern web directory that is filled up by site owners or editors. At about the same time, the first 'spiders' appeared. These were: JumpStation, World Wide Web Worm, and Repository-Based Software Engineering** starting the new era of World Wide Web search. Google and Yahoo are two of their better-known descendants.
Search Engines Today
Modern web searchers are divided into two main groups:
• search engines and
Search engines automatically 'crawl' web pages (by following hyperlinks) and store copies of them in an index, so that they can generate a list of resources according to users' requests (see ‘How Search Engines Work’, below). Directories are compiled by site owners or directory editors (in other words, humans) according to categories. In truth, most modern web search combine the two systems to produce their results.
How Search Engines Work
All search engines consist of three main parts:
• the spider (or worm);
• the index; and
• the search algorithm.
The first of these, the spider (or worm), continuously ‘crawls’ web space, following links that lead both to within the limits of a website and to completely different websites. A spider ‘reads’ all pages’ content and passes the data to the index.
The Index is the second part of a search engine. It is a storage area for spidered web pages and can be of a huge magnitude (Google’s index, for example is said to consist of three billion pages).
The third part of a search engine system is the most sophisticated. It is the search algorithm, a very complicated mechanism that sorts an immense database within a few seconds and produces the results list. Looking like a web page (or, most often, lots of pages), it contains links to resources that match users' queries (i.e., relevant resources). The most relevant ones (as the search engine sees it) are nearer the top of the list. They are the ones most likely to be clicked by the user of the search engine. A site owner should therefore take heed of the site's relevancy to the keywords it is expected will be used to find it.
A Relevancy calculation algorithm is unique for every search engine, and is a trade secret, kept hidden from the public. However, there are some common principles, which will be discussed in the following paragraph.
What to Do to Have Your Web Site Found through Search Engines
There are some simple rules to make your resource relevant enough to be ranked in the top 10 by the majority of search engines.
Rule 1: Work on the body copy
A search engine determines the topic of your site judging by the textual information (or content) of every page. Of course, it cannot comprehend the content the way humans do, but this is not critical. It is much more important to include keywords, which are found and compared with users' queries by the programme. The more often you use targeted keywords, the better your page will be ranked when a search on those keywords is made.
You can increase the relevancy of your targeted keywords still more if you include them in the HTML title of your page (
tags), in hyperlinks ( tag), or just emphasize them with bold font ( or tags). Meta tags and were introduced specifically to help search engines. Unfortunately, they are rapidly losing their significance because it is too easy to abuse them. Webmasters should therefore concentrate mainly on body copy, which is the part of textual content placed between the and the tags.
One should take into account the facts that the search engines' algorithms are constantly improving and that index databases are updated. When you have aquired the desired position in the listings, do not rest on your laurels. Site optimisation should become a permanent job for all site owners who regard web presence as an important part of their business.
Rule 2: Build links to your site
As we have mentioned before, a spider scans the web following the links placed by site owners onto their pages in order to inform their visitors of where to find something that might be of interest. So, the greater the number of website owners agreeing to list your site, the smaller the time that will pass before all existing search engines will find out about you. What's more, those pages that are linked from multiple sites are considered by crawlers as more important. Google (http://www.google.com/) implements this concept via a so called Page Rank; other engines analyse your site's popularity in different ways. Remember that a link from a site that itself ranks well, is much more valuable than just any link. Also note that content relevancy of the site linking to you further increases the importance of the link.
Rule 3: Play fair
Do not get involved in unfair games with search engines. If you feel that your method is obviously deceptive, do not use it. Here are just some of widespread methods used by unscrupulous webmasters.
Let's assume that the site owner wishes to make a page very relevant to a certain key phrase. The most obvious course to take is to include this phrase into a page copy as many times as possible. When it starts looking unnatural (that is, the keyword density value becomes excessive), it will be regarded as a kind of spam (so-called keyword damping). This page will look odd both for human visitors and for search engines. Consequently, any WWW user will hardly wish to return to this page after having visited it just once, and search engines will be likely to penalise spam by reducing the page's ranking.
Using colours to hide multiple keywords, as a kind of spam
Some web masters in their vain hope to deceive search engines go a step further. They make the part of body copy, which is intended only for search engines, invisible (that is, of a colour identical or just a shade different from the background color), or tiny enough to be indistinguishable (i.e., 1 or 2 pixels high). Modern search engines have become smart enough to detect such tricks, so we wouldn't advise you to use these methods. You might even win for a short time, but lose afterwards, because some search engines penalise spammers by excluding their web sites from their databases.
Many site owners unite in so called link farms in order to artificially increase the link popularity value. These are nothing but networks where everyone links to everyone else, concentrating on the quantity of links and disregarding their quality. Their efficiency is very low. First, a page can deliver just a small part of its value to every page it links to within the farm. If it contains too many links, this part will be worthless. Second, a page that contains links, just links, and nothing else but links, cannot be very authoritative for quite natural reasons. Besides, modern search engines analyse the link quality in terms of web site relevancy, ranking the link highly if it leads to a site devoted to similar issues. So, when you are looking for link exchange partners, choose those whose business is similar to yours. The sites of your partners, or web portals devoted to your business issues, are ideal for this.
This is a widespread technology that aims to deceive search engines . The point is, all known spidering robots recognised by their IP addresses or host names are redirected to a page that is specially polished to meet search engines' requirements, but is unreadable to a human being. In order to detect cloakers, spiders often come from fake IP addresses and under fictitious names. Also, users' feedback is collected, and if people too often find that a page's real content doesn't match its declared description, the page is revised by search engine owners’ staff and runs the risk of being penalised.
Rule 4: Your site must be interesting
Increasing the number of pages included on your site, and the quality of information you place on those pages, increases the probability of getting good links to your pages. Interesting articles, and actual news concerning your business, will attract visitors' attention, and your site will be well-known and spoken of. If you gain a good reputation on the Internet, your commercial success will be almost certain, and the site will promote itself.
Good site structure is also very important. If your site is created with the basic usability requirements in mind, and is categorised well, the users will enjoy visiting it. Every page should be easily accessed from the home page, and plain text links are preferred. Thus, a search engine robot will experience no difficulties whilst spidering your site content, following the links that lead from one page to another.
As you can see, merely having a website or running a company does not guarantee success. The demands of promotion, catering for conditions of your web site or brand recognition, popularity and attracting still more clients must be of prime importance.
We have introduced you to the majority of tools used for the promotion of your business on the Internet. These tools apply the technologies that facilitate searching for desired resources. Evidently, website owners can be discouraged by the multiplicity of web searching algorithms as this demands search engine optimisation and comprehensive spadework. So if you don’t think you can cope with this job, it is probably worth seeking a qualified Internet promoter or an Internet promotion company in order to gain good results at affordable costs. You will surely stand high in directories and search engines results and therefore increase traffic and the number of potential clients your business has access to.
* Before HTTP protocol was invented (around 1989-1991) the Internet was just a huge network consisting of FTP servers, and was used as a means of file exchange. There were no websites at all. the first search engines mentioned in the article ran via FTP and similar protocols. Only after Tim Burners-Lee had created HTTP protocol, did we get the World Wide Web, and the Internet acquired its actual shape.
** The first search robots that supported HTTP.
About the Author
Dmitry Antonoff, 28. I've been with Magic Web Solutions ltd. (UK), Dartford, Kent, as a marketer and an SEO consuntant, since May 2003. I specialise in website promotion, and copywriting. I'm eager to share my experience with the Internet community.
Irina, Ponomareva, 32. I joined Magic Web Solutions ltd. (UK), Dartford, Kent, on March 2003. I've been working as a web master, a developer, and an SEO specialist ever since.