The Google API uses the SOAP (Simple Object Access Protocol) and WSDL (Web Services Description Language) standards to offer developers an easy way to run search queries outside of the browser and, because of the way the search engine indexes executables, Websense was able to create code to look for strings associated with malware packers.
Dan Hubbard, senior director of security and technology research at the San Diego-based Web filtering software firm, said the use of the Google API started as an experiment after bloggers noticed that some Google search queries were returning .exe files.
When Google indexes an executable file, Hubbards research team found, the search engine parses the PE (Portable Executable) file format of the Windows executable. This means that queries can be written to extract items from the internals of the binary.
Hubbard said Websense created code to query "unique identifiers" within the PE file format that would indicate that the file was potentially malicious.
"Were finding literally thousands of sites with malicious code executables. From hacker forums, newsgroups to mailing list archives, theyre all full of executables that Google is indexing," Hubbard said in an interview with eWEEK.
About 15 percent of the results came back from legitimate Web sites hijacked by malicious hackers and seeded with executables. "We were able to find a lot of compromised sites distributing malware, most likely without the knowledge of the site owner," Hubbard said.
The queries also turned up pieces of spyware on popular online gaming sites and variants of the virulent Bagel and Mytob worms.
"While we do not believe that the fact that Google is indexing binary file contents is a large threat, this is further evidence of a rise in Web sites being used as an method of storing and distributing malicious code," Websense said in a research note announcing the experiment.
"If you know what to search for within binaries, it could be a really good research tool," Hubbard added.
Hubbard said he plans to publish the full results of the experiment and the actual code used in the API queries on private security mailing lists to help other researchers automate the process of finding malicious Web sites.
"At Websense, were mining almost 80 million Web sites every 24 hours to look for threats. The big issue is that you cant anymore wait for people to send you malware samples. You have to go out and proactively look for stuff," he said.
Researchers from Microsofts anti-malware engineering team are also working on an automated way to classify malware families and variants attacking Windows computers. Microsoft is proposing the use of distance-measure and machine learning technologies to come up with automatic classification of viruses, Trojans, spyware, rootkits and other malicious software programs.