NEW YORK—One company has an idea for how search engines can catalog the Web more completely. Another believes it can better divine what a searcher wants. Yet another is trying to sync all that with how the human brain works.
Startups and leading tech companies, including search exemplar Google, are tinkering with new ways of culling and presenting information—ones that could prompt the next revolution in search.
“Because information is exploding, [the Internet] is going to become increasingly difficult to use if we dont get it right,” said Liesl Capper, chief executive of Australian search startup Mooter.
Current technology troubles users like private investigator Cynthia Hetherington. When she suspected an Australian company recently of possible fraud, Hetherington turned first to Google. But then she went to the Australian Securities and Investments Commission, LexisNexis and Dun & Bradstreet.
Users who consider Google exhaustive are only fooling themselves, experts say. Todays search engines may be capturing as little as 1 percent of the Web, largely because of how they find and index online resources.
“Its very frustrating,” said Hetherington, who runs a Haskell, N.J. company. “Its like going to a library and only pulling one book off the shelf.”
Search analyst Danny Sullivan sees promise in developments to address such flaws, and he believes tomorrows search engines are likely to blend the best.
But he also cautioned that the Internet is littered with search innovations that failed to draw investors or market share.
Currently, all search engines fail to capture the bulk of the “invisible Web”—resources locked up in databases and inaccessible by the engines indexing crawlers. These include regulatory filings at the U.S. Securities and Exchange Commission, detailed reports on charities at GuideStar and complete archives of most newspapers.
Sometimes, accessing an “invisible” database requires payment. Search engines cant let you know about a documents availability for purchase if they cant scan it in the first place.
But even when a database is free, a site may require registration, prohibit search crawlers or use incompatible formats.
In particular, crawlers are stymied by dynamic Web pages, which are customized as users choose various options, such as car color at Cars.com.
To counter that, Chicago-based Dipsie Inc. is developing software that promises to fill out Cars.coms simple online forms, which are based on multiple choice, though not the complex ones for the governments patent and trademark databases, which require typing in keywords. A public test version is expected by summer.
Other companies are working to capture sound and video files that have troubled text-based crawlers.
StreamSage Inc. uses speech-recognition technology to transcribe feeds, so a search engine can pull out relevant portions of a long presentation. Company president Seth Murray said Harvards medical school and NASA already use the technology, but engineers still must speed it up for broader use.
Yahoo Inc. is going a less technical, more controversial route: Businesses can pay to ensure that their “invisible Web” pages get indexed.
But indexing more of the Web only brings up another challenge—identifying the most relevant among the billions of documents available. So some search developers are focused on personalizing and organizing searches.
Eurekster Inc., a startup launched in January, is marrying search with social networking, in which friends, your friends friends and their friends form online circles. Eurekster guesses what youre seeking based on what others in your circle have found relevant.
“At the moment, when you search on Google, everyone gets the same results for the same keywords,” said Shaun Ryan, vice president of business development for Eurekster in New Zealand. “We try to personalize those results.”
So, a search for “casting” might produce sites on movies if your circle is heavily in entertainment, fly fishing if members enjoy weekend outings.
Major Search Engines Pursue
The major search engines, meanwhile, are trying to localize results, Yahoo! and America Online having an advantage over Google because they already have billing or registration information on many users.
And sites like SuperPages.com are tagging data, so customers can search not only by city but by store hours or credit cards accepted. Adding “Saturday” to a Google search might get you a store thats closed Saturday, or it might indicate Saturdays hours.
Tags also help Factiva personalize its archives of 9,000 news sources, so an engineering team gets tech-heavy results, while the marketing department gets consumer-friendly documents.
“People dont want to be spending time searching and looking for things,” said Clare Hart, Factivas chief executive. “They want to be spending the time analyzing the information.”
At Microsoft Corp., researchers are exploring ways to return specific facts rather than entire documents. A search for “Marilyn Monroes birthday” would return an answer, “June 1, 1926,” instead of sites on her famous “Happy Birthday, Mr. President” performance.
“We still have this library metaphor of ‘Let me give you back a bunch of books that might help you … rather than ‘Let me go through the books for you and figure out what youre looking for,” said Eric Brill, a senior researcher with Microsofts AskMSR project.
Mooter tries to mimic the brains organization methods by identifying underlying themes and grouping sites—a search on travel in Spain might separate hotels from warnings about terrorism. Mooter also attempts to refine results based on links a user visits.
Building the technology is expensive, and some experts believe the best tools may be developed by and reserved for pay services such as Factiva and ChoicePoint Inc., which aggregates personal, financial and legal data from a variety of government and corporate sources.
But dont count Google out. It has hundreds of engineers in California, New York, India and soon Switzerland working to make searching better, most recently with localized searching.
Googles director of technology, Craig Silverstein, said the industry leader must keep innovating because search is bound to morph into something completely different within a decade.
“It will be something that we havent even thought of yet,” Silverstein said. He offered few details, but the Google Labs site offers a peek.
One project, Google WebQuotes, returns listings with comments from other sites to help you evaluate a sites credibility and reputation.