Keyword Density - More Than Meets the Eye

By Ralph Tegtmeier
Posted Thursday, July 1, 2004

One of the standard elements of web page
optimization is Keyword Density: up until very recently
the ratio of keywords to rest of body text was
generally deemed to be one of the most important
factors employed by search engines to determine a web
site's ranking.

However, this basically linear approach is gradually
changing now: as mathematical linguistics and
automatic content recognition technology progresses,
the major search engines are shifting their focus
towards "theme" biased algorithms that do not rely on
analysis of individual web pages anymore but, rather,
will evaluate whole web sites to determine their
topical focus or "theme" and its relevance in relation
to users' search requests.

This is not to say that keyword density is losing in
importance, quite the contrary. However, it is turning
into a lot more complex technology than a simple
computation of word frequency per web page can handle.

Context analysis is now being determined by a number
of auxiliary linguistic disciplines and technology,
for example:
* semantic text analysis
* textlexical database technology
* distribution analysis of lexical components (such as
nouns, adjectives, verbs)
* evaluation of distance between semantic elements
* AI and data mining technology based pattern
recognition;
* term vector database technology
etc.

All these are now contributing to the increasing
sophistication of the relevance determination process.
If you feel this is beginning to sound too much like
rocket science for comfort, you may not be very far
from the truth: it seems that the future of search
engine optimization will be determined by what the
industry is fond to term the "word gurus".

A sound knowledge of fundamental linguist methodology
plus more than a mere smattering of statistical
calculus will most probably be paramount to achieve
successful search engine rankings in the foreseeable
future. Merely repeating the well worn mantra "content
is king!", as some of the lesser qualified SEO
professionals and very many amateurs are currently
doing, may admittedly have a welcome sedative effect
by creating a feeling of fuzzy warmth and comfort. But
to all practical purposes it is tantamount to
whistling in the dark and fails miserably in doing
justice to the overall complexity of the process
involved.

It should be noted that we are talking presence AND
future here: many of the classical techniques of
search engine optimization are still working more or
less successfully, but there is little doubt that they
are rapidly losing their cutting edge and will
probably be as obsolete in a few months' time as
spamdexing or invisible text - both optimization
techniques well worth their while throughout the 90s -
have become today.

So where does keyword density come into this equation?
And how is it determined anyway?

There's the rub: the term "keyword density" is by no
means as objective and clear-cut as many people (some
SEO experts included) will have it! The reason for
this is the inherent structure of hypertext markup
language (HTM) code: as text content elements are
embedded in clear text command tags governing display
and layout, it is not easy to determine what should or
should not be factored into any keyword density
calculus.

The matter is complicated further by the fact that the
meta tags inside a HTML page's header may contain
keywords and description content: should these be
added to the total word count or not? Seeing that some
search engines will ignore meta tags altogether (e.g.
Lycos, Excite and Fast/Alltheweb), whereas others are
still considering them (at least partially), it gets
even more confusing. What may qualify for a keyword
density of 2% under one frame of reference (e.g.
including meta tags, graphics ALT tags, comment tags,
etc.) may easily be reduced to 1% or less under
another.

Further questions arise: will meta tags following the
Dublin Convention ("D.C. tags") be counted in or not?
And what about HTTP-EQUIV tags? Would you really bet
the ranch that TITLE tags in tables, forms or DIV
elements will be ignored? Etc., etc.

Another fundamental factor generating massive
fuzziness left, right and center, is the issue of
semantic delimiters: what's a "word" and what isn't?
Determing a lexical unity (aka a "word") by
punctuation is a common though pretty low tech method
which may lead to some rather unexpected results.

Say you are featuring an article by an author named
"John Doe" who happens to sport a master's degree in
arts, commonly abbreviated as "M.A.". While most
algorithms will correctly count "John" and "Doe" as
separate words, the "M.A." string is quite another
story. Some algorithms will actually count this for
two words ("M" and "A") because of the period (dot) is
considered a delimiter - whereas others (surprise!)
will not. But how would you know which search engines
are handling it in which way? Answer: you don't, and
that's exactly where the problems start.

The only feasible approach to master this predicament
is trial and error. The typical beginner's inquiry
"What's the best keyword density for AltaVista?",
understandable and basically rational as it may be, is
best answered with the fairly frustrating but
ultimately precise: "It all depends - your mileage may
vary." It is only by experimenting with keyword
densities under standardized, comparable conditions
yourself that you will be able to come to significant
and viable conclusions.

To get going, here are some links to pertinent
programs that will help you determine (and, in one
case, even generate) keyword densities.

KeyWord Density Analyzer (KDA)
------------------------------
An all time classic of client based keyword density
software is Roberto Grassi's powerful KeyWord Density
Analyzer (KDA). It is immensely configurable and
offers a fully featured free evaluation version for
download. Find it here:
< (http://www.grsoftware.net/grkda.html) >
(Expect to pay appr. $99 for the registered version.)

Concordance
-----------
Concordance is a powerful client based text analysis
tool for making word lists and concordances from
electronic texts.
A trial version can be downloaded here:
(http://www.rjcw.freeserve.co.uk/features.htm)
(Expect to pay appr. $89 for the registered version.)

fantomas keyMixer(TM)
---------------------
Our own fantomas keyMixer(TM) is the world's first
automatic keyword density generator, enabling you to
create web pages with ultra precise densities to the
first decimal digit. Read more about this server based
Perl/CGI application here:
(http://fantomaster.com/fakmixer0.html)
(Expect to pay appr. $99 for the registered version.)

About the Author
Ralph Tegtmeier is the co-founder and principal of
fantomaster.com Ltd. (UK) and fantomaster.com GmbH
(Belgium), < (http://fantomaster.com/) > a company
specializing in webmasters software development,
industrial-strength cloaking and search engine
positioning services.
You can contact him at
mailto:fneditor@fantomaster.com