Microsoft Research Automates Hunt for Search Engine Spam

Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine and blog comment spammers.

Researchers at Microsoft are working on an ambitious new project to hunt down and neutralize large-scale search engine spammers.

The Redmond, Wash., software giants Cybersecurity and Systems Management Research Group has taken the wraps off Strider Search Defender, an experimental project that automates the discovery of search spammers through non-content analysis.

The project integrates technology from two previous Microsoft Research prototypes—Strider HoneyMonkey and Strider URL Tracer—and promises a new approach to removing junk results from search engine queries.

"The Web is so badly spammed, you can find a spam site on just about every search query," said Yi-Min Wang, the researcher heading up the project at Microsoft, in an interview with eWEEK. "We think this approach can pinpoint the big spammers and use their own tactic against them."

According to data from Automattic Kismet, a tool that helps bloggers thwart comment spammers, a whopping 93 percent of all blog comments are spam. With Strider Search Defender, Wangs team is taking a context-based approach that uses URL-redirection analysis to pinpoint spammers.

"For the spammers to be successful, they have to post millions of fake comments on message boards and blogs. Thats the only way to get picked up by search engines. If we can find a way to pinpoint them before they get indexed by search engines, the problem is solved," Wang said.

"They want to be found by search engines, thats why theyre spamming. Well, now were finding you," he added.

The problem is tied to the use of spam blogs, or splogs, to earn money from pay-per-click advertising programs offered by Google, Yahoo and MSN. Content on fake blogs often contain text stolen from legitimate Web sites and include an unusually high number of links to sites associated with the splog creator. The sole purpose is to boost the search engine rank of the affiliated sites and cash in on ad impressions from unsuspecting surfers.

/zimages/6/28571.gifRead more here about the Strider TypoPatrol and URL Tracer projects.

During the early stages of the Microsoft research, Wang discovered that successful large-scale spammers create a huge number of "doorway pages" on reputable domains to trick search engine users into clicking on a fake site. It is well-known that Googles BlogSpot, Yahoos GeoCities and AOLs Hometown services are all used by spammers to create doorway pages.

The doorway pages are then spammed to millions of forums, blog comments and archived newsgroups, pushing the page up the search engine results for certain target keywords. A user clicking on a doorway-page link in search listings gets redirected to a target page controlled by the spammer or, in some cases, Wang explained, the browser is instructed to either redirect to or fetch ads listing operated by the spammer.

Next Page: "Monkey program" analyzes traffic.