Page Two

By Matthew Hicks  |  Posted 2004-06-10 Print this article Print

Microsoft has said it plans as early as the end of this year to launch its own algorithmic search engine for MSN Search. MSN today is using search results from Yahoo Inc.s engine. The Web page spam research is based on two different crawls of the Web conducted almost two years ago, Najork said. Using the results from the crawl of 150 million Web pages conducted over the course of 11 weeks, researchers found that 8.1 percent of the pages were spam and that various statistical techniques could identify about 75 percent of those spam pages. The statistical techniques look for such anomalies as a high number of host names being associated with the same IP address, a large number of characters or words being used in a host name, and an unusual distribution of links.
The Microsoft Research team plans to present its findings in a paper called "Spam, Damn Spam, and Statistics" during a Paris workshop next week. Next up is analyzing Web page content and words to weed out spamlike patterns, Najork said.
Researchers also are working to use natural language processing to automatically write summaries of news stories and items in a newsbot application. The ability for a computer to generate a summary could be important as more search sites attempt to crawl and sort news sources. MSN, for instance, is planning to launch a new news search service later this year. To thwart Internet worms, researchers are proposing a line of defense in the network stack that could prevent the spread of worms even before software patches are available or deployed. Called Shield, the project uses network filters to monitor the incoming and outgoing traffic of vulnerable applications in order to stop traffic using an exploit. Check out eWEEK.coms Windows Center at for Microsoft and Windows news, views and analysis.

Be sure to add our eWEEK.c om developer and Web services news feed to your RSS newsreader or My Yahoo page

Matthew Hicks As an online reporter for, Matt Hicks covers the fast-changing developments in Internet technologies. His coverage includes the growing field of Web conferencing software and services. With eight years as a business and technology journalist, Matt has gained insight into the market strategies of IT vendors as well as the needs of enterprise IT managers. He joined Ziff Davis in 1999 as a staff writer for the former Strategies section of eWEEK, where he wrote in-depth features about corporate strategies for e-business and enterprise software. In 2002, he moved to the News department at the magazine as a senior writer specializing in coverage of database software and enterprise networking. Later that year Matt started a yearlong fellowship in Washington, DC, after being awarded an American Political Science Association Congressional Fellowship for Journalist. As a fellow, he spent nine months working on policy issues, including technology policy, in for a Member of the U.S. House of Representatives. He rejoined Ziff Davis in August 2003 as a reporter dedicated to online coverage for Along with Web conferencing, he follows search engines, Web browsers, speech technology and the Internet domain-naming system.

Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel