IBMs WebSphere Update Will Let Spiders Crawl More Pages

 
 
By Lisa Vaas  |  Posted 2005-07-25 Email Print this article Print
 
 
 
 
 
 
 

The coming URL-mapping upgrade to WebSphere Commerce will allow spidering technology to reach and index dynamic Web pages, a type often used by e-commerce sites.

IBM is enabling spidering technology to reach and index more of WebSphere Commerce users Web pages in its coming upgrade, due out next week. In WebSphere Commerce 5.6.1, IBM will deliver URL mapping that allows search engines to index dynamic pages. Dynamic pages have long been a stumbling block for spidering technology, also known as Web crawler technology. Spiders jump from link to link, crawl throughout the Web and index Web pages to return search results to engines such as Google or Yahoo.
Dynamic pages are generated by many e-commerce sites. Theyre created by piecing together data regarding who a visitor is, what consumer segment the visitor falls into, what purchases have been made in the past, and everything about the customers preferences and behaviors that retailers need in order to personalize the site with recommendations and content.
In assembling that sort of dynamic page, URLs are created that use characters such as ampersands, equal signs and question marks—all characters that stop spiders, which dont understand what they are. Instead of burrowing down through a site, from category to subcategory to product to item, getting more detail as it goes and coming up with a large number of pages to return to Yahoo or Google, a spider will instead stumble and retreat. Read more here about IBMs WebSphere application.
According to Craig Stevenson, manager of strategy and planning for e-commerce and multichannel retailing solutions, customers have been asking for better natural search results from major search engines. Its hardly surprising, given recent research such as an E-Tailing Group Inc. study showing that some 46 percent of consumers use search engines to begin their online shopping processes. "What this is saying is search engines are very important when people go online," Stevenson said. "Not just for researching or finding information, but when people want to buy something. A lot of people dont know to go to a retailer URL and are using search engines instead. Retailers are saying, Were losing out on a ton of customers using search engines but not finding their way to our pages, because the pages arent being indexed." The URL mapping capability in the next version of WebSphere Commerce automatically removes the stop characters generated by dynamic pages, allowing spiders to drill down through retailers sites and index a much fuller set of pages, Stevenson said. As it is, many retailers are turning to third-party vendors to get that capability. Such vendors often create proxy sites that are merely lists of static pages hosted by a third party. By taking out the stop characters, WebSphere Commerce avoids the use of such static proxy sites. As a retailer makes changes to pages, adds different categories, changes content or adds more pages, spiders will be able to index those pages to automatically retrieve an up-to-the-minute search return. Click here to read about MSNs Start.com aggregator. IBM, in working with customers to test the capability, managed to improve a customers search engine optimization to the point that a Google search return that initially netted two pages was improved to a return of 288 indexed pages, Stevenson said. It doesnt stop there. Once this type of crawling becomes able to index pages, it will continue to do so over time. As more Googles and Yahoos and other search engines hit sites, more and more pages will be indexed. WebSphere Commerce 5.6.1 will also gain site map capability, which provides an entry point for the search engine crawler to easily follow links within Web pages. Thus, if a spider hits a site map, its easier to drill down from category to subcategory to product to item. Finally, once the product is out, IBM intends to post best practices information on its Web site information center, regarding how to construct metadata and instructing users on what they should name pages in order to make it easier for spiders to index those pages. IBM WebSphere Commerce Business Edition V5.6.1 costs $125,000 per processor. Professional edition is $80,000 per processor, and the Commerce-Express edition is $20K per processor license. Prices include one year of maintenance, a staging server and a development license. Check out eWEEK.coms for the latest news, views and analysis on enterprise search technology.

 
 
 
 
Lisa Vaas is News Editor/Operations for eWEEK.com and also serves as editor of the Database topic center. Since 1995, she has also been a Webcast news show anchorperson and a reporter covering the IT industry. She has focused on customer relationship management technology, IT salaries and careers, effects of the H1-B visa on the technology workforce, wireless technology, security, and, most recently, databases and the technologies that touch upon them. Her articles have appeared in eWEEK's print edition, on eWEEK.com, and in the startup IT magazine PC Connection. Prior to becoming a journalist, Vaas experienced an array of eye-opening careers, including driving a cab in Boston, photographing cranky babies in shopping malls, selling cameras, typography and computer training. She stopped a hair short of finishing an M.A. in English at the University of Massachusetts in Boston. She earned a B.S. in Communications from Emerson College. She runs two open-mic reading series in Boston and currently keeps bees in her home in Mashpee, Mass.
 
 
 
 
 
 
 

Submit a Comment

Loading Comments...
 
Manage your Newsletters: Login   Register My Newsletters























 
 
 
 
 
 
 
 
 
 
 
Close
Thanks for your registration, follow us on our social networks to keep up-to-date
Rocket Fuel