IBMs WebSphere Update Will Let Spiders Crawl More Pages
In WebSphere Commerce 5.6.1, IBM will deliver URL mapping that allows search engines to index dynamic pages.
Dynamic pages have long been a stumbling block for spidering technology, also known as Web crawler technology. Spiders jump from link to link, crawl throughout the Web and index Web pages to return search results to engines such as Google or Yahoo.
Dynamic pages are generated by many e-commerce sites. Theyre created by piecing together data regarding who a visitor is, what consumer segment the visitor falls into, what purchases have been made in the past, and everything about the customers preferences and behaviors that retailers need in order to personalize the site with recommendations and content.
In assembling that sort of dynamic page, URLs are created that use characters such as ampersands, equal signs and question marksall characters that stop spiders, which dont understand what they are.
Instead of burrowing down through a site, from category to subcategory to product to item, getting more detail as it goes and coming up with a large number of pages to return to Yahoo or Google, a spider will instead stumble and retreat.
According to Craig Stevenson, manager of strategy and planning for e-commerce and multichannel retailing solutions, customers have been asking for better natural search results from major search engines.
Its hardly surprising, given recent research such as an E-Tailing Group Inc. study showing that some 46 percent of consumers use search engines to begin their online shopping processes.
"What this is saying is search engines are very important when people go online," Stevenson said. "Not just for researching or finding information, but when people want to buy something. A lot of people dont know to go to a retailer URL and are using search engines instead. Retailers are saying, Were losing out on a ton of customers using search engines but not finding their way to our pages, because the pages arent being indexed."
The URL mapping capability in the next version of WebSphere Commerce automatically removes the stop characters generated by dynamic pages, allowing spiders to drill down through retailers sites and index a much fuller set of pages, Stevenson said.
As it is, many retailers are turning to third-party vendors to get that capability. Such vendors often create proxy sites that are merely lists of static pages hosted by a third party.
By taking out the stop characters, WebSphere Commerce avoids the use of such static proxy sites. As a retailer makes changes to pages, adds different categories, changes content or adds more pages, spiders will be able to index those pages to automatically retrieve an up-to-the-minute search return.
IBM, in working with customers to test the capability, managed to improve a customers search engine optimization to the point that a Google search return that initially netted two pages was improved to a return of 288 indexed pages, Stevenson said.
It doesnt stop there. Once this type of crawling becomes able to index pages, it will continue to do so over time. As more Googles and Yahoos and other search engines hit sites, more and more pages will be indexed.
WebSphere Commerce 5.6.1 will also gain site map capability, which provides an entry point for the search engine crawler to easily follow links within Web pages. Thus, if a spider hits a site map, its easier to drill down from category to subcategory to product to item.
Finally, once the product is out, IBM intends to post best practices information on its Web site information center, regarding how to construct metadata and instructing users on what they should name pages in order to make it easier for spiders to index those pages.
IBM WebSphere Commerce Business Edition V5.6.1 costs $125,000 per processor. Professional edition is $80,000 per processor, and the Commerce-Express edition is $20K per processor license. Prices include one year of maintenance, a staging server and a development license.
Check out eWEEK.coms for the latest news, views and analysis on enterprise search technology.