Microsoft researchers are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers.
The Redmond, Wash., companys Cybersecurity and Systems Management Research Group on July 13 unveiled Strider Search Defender, an experimental project that automates the discovery of search spammers through noncontent analysis.
The project integrates technology from Strider HoneyMonkey and Strider URL Tracer. It promises a new,
context-based approach that uses URL-redirection analysis to pinpoint spammers in order to remove junk results from search engine queries.
"[Successful spammers] have to post millions of fake comments on message boards and blogs. ... If we can find a way to pinpoint them before they get indexed by search engines, the problem is solved," said Yi-Min Wang, the researcher heading the project.
The problem is tied to the use of spam blogs, or splogs, used to earn money from pay-per-click advertising programs. Content on fake blogs often contains text stolen from legitimate Web sites and includes an unusually high number of links to sites associated with the splog creator.
Wang discovered early on that large-scale spammers create a huge number of "doorway pages" on reputable domains to trick search engine users into clicking on a fake site.
Doorway pages are spammed to millions of forums, blog comments and archived newsgroups, pushing the page up the search engine results for target keywords. A user clicking on a doorway-page link in search listings gets redirected to a page controlled by the spammer.
Microsoft Research is proposing to treat each spam page as a dynamic program rather than a static page and to use a "monkey program" to analyze the traffic resulting from visiting each page with an actual browser.
Strider Search Defender starts with a seed list of confirmed spam URLs and uses a homegrown tool called Spam Hunter to run link queries on search engines. This automated process pinpoints the forums and guest books on which known spam URLs were posted. On these pages, more spam links are scraped to automatically generate a list of spam URLs. To filter out false positives, Microsoft feeds the list of potential spam URLs to the Strider URL Tracer, a tool that helps trademark owners find typo-squatting domains of their Web sites.