The Microsoft Research team is now proposing to treat each spam page as a dynamic program rather than a static page and use a "monkey program" to analyze the traffic resulting from visiting each page with an actual browser. "By identifying those domains that serve target pages for a large number of doorway pages, we can catch major spammers domains together with all their doorway pages and doorway domains," Wang explained. Read more here about Microsofts Strider HoneyMonkey project.To filter out false positives, Microsoft feeds the list of potential spam URLs to the Strider URL Tracer, a tool released earlier this year by Microsoft to help trademark owners find typo-squatting domains of their Web sites. Using the URL Tracer, Wangs team can launch an actual browser to visit each URL and record all secondary URLs visited as a result. At the end of that automated scan, the researchers can figure out which target-page domains are associated with a large number of doorway-page URLs. In one scenario, Wang said the Spam Hunter collected more than 17,000 BlogSpot URLs and fed them into the URL Tracer. The group was able to identify the top 25 target-page domains that are behind the Google-hosted splogs. The top six are particularly active, Wang said, identifying them as s-e-arch.com, speedsearcher.net, abcsearcher.com, eash.info, paysefeed.net and veryfastsearch.com, which collectively were responsible for approximately 45 percent of the BlogSpot URLs. Wang said the Strider Search Defender project has already helped to remove junk results from MSN Search. "The more widely spammed a URL is, the easier it is for the Spam Hunter to find it. Once a spammed forum is identified, it becomes a HoneyForum that can be used to capture new spam URLs in new comment postings," he said. "Ideally, since there is a delay between spamming and its effect on search engine results, our spam hunter should be able to identify new spam URLs and notify the search engine before the URLs enter top search results." Check out eWEEK.coms for the latest security news, reviews and analysis. And for insights on security coverage around the Web, take a look at eWEEK.com Security Center Editor Larry Seltzers Weblog.
Strider Search Defender starts with a seed list of confirmed spam URLs and uses a homegrown tool called Spam Hunter to run link queries on search engines. This is an automated process that pinpoints the forums and guest books on which the known spam URLs were posted. On these pages, additional spam links are scrapped to automatically generate a list of spam URLs.