Optimising Dynamic Content
Posted Wednesday, May 21, 2003
Having created, optimised and edited a fair few static websites, I must say that I can truly appreciate, as do many web editors, content managers and ecommerce people, the gift of dynamic database driven websites. Just modify a database file here and there and as if by magic the whole website appears, with the new modifications, links and all.
The problem up until now is that these wonderful creations, whilst they're a content management godsend, have created something of an optimisation nightmare, because the way the create pages actually acts as a roadblock to search engines.
Most of these platforms are based around a few repeat offenders. Technologies like ASP, Cold fusion, Perl etc. Whilst it works fine from a user perspective, a search engine will have a more difficult time.
What's creating the problem ?
The difficulty that the search engine experiences comes from the fact that dynamically generated web pages don't actually exist until someone inputs the variables, queries or actions to generate them. This could be mouse over, a query a string in a search field, or a dynamically generated link.
A search engine spider is like a passive viewer that just wants to look at the page. It's not capable of mousing over anything to produce a dynamic drop down menu of links to follow, or entering a query string into your product search box. Thus all your pages of content stay locked up tight in the database with no way for the spider to access and read it.
Symbols in your URL
Sometimes the problem may not just be down to the fact that a page is dynamically delivered, but rather the page name generated in the URL may often contain characters and symbols which confuse the search engine and flag what might be a spider trap.
Such characters include '?' which are very common with ASP sites aswell as '&' '%' signs. Search engines are also know to reject pages that reference a cgi bin.
This resulting URL is called a "Query String" and it works just fine for web viewing purposes. The problem is that that many spiders cannot read beyond any character in a URL that contains a question mark (?).
So where as your url might read http//www.domain.com/products.htm?=1234.services
As search engine would only read up to http//www.domain.com/products.htm?. This may have the effect of producing a dead link, or take the spider to a page where only part of the content is accessible. It may choose to not index the link altogether.
Why refuse to index the link ?
These symbols as mentioned above clearly indicate that the spider is about to crawl a database. Sometimes a spider may fall into a scenario where by the CGI or database feeds it an infinite number of url's, forcing the spider to continue crawling until it brings both the host and the robot down.
Search engines are getting better at it
Some search engines have obviously realised that if they are to stay ahead of the game in indexing the web then they must stay ahead of developments within content production and management technologies, which will continue to make up an ever-increasing part of the webs content.
Google has made particular strides in this field, adding many new formats to what it is able to index, perhaps more so then others, and has recently started spidering some dynamic url's with characters in them. Currently search engines are covering themselves and venturing into this field by only indexing the 1 specific page it comes to and is not following the links. Presumably to continue its avoidance of spider traps.
Optimising dynamic sites for search engines
Despite the obstacles it creates, there are search engine optimization remedies for such sites. Although care should be taken before conducting wholesale optimisation of your site. If the site is still not readable to search engines it will all have been in vain. Greenlight search engine optimisation usually tackles these obstacles at layer 1 of its multi layered optimisation programme, with what it refers to as the sub site layer. This deals with the technology on which a website is hosted, and in essence sets a foundation for subsequent optimisation.