DARPA Project Uses Big Data to Find, Fix Software Security Flaws

Academic and industry researchers are applying machine learning techniques to find security flaws in terabytes of software code.

Download the authoritative guide: The Ultimate Guide to IT Security Vendors

Software Flaw Hunging 2

A non-profit research lab working with Stanford University is developing a machine learning system that will analyze terabytes of software code to find security flaws and fix them.

Draper Laboratory, formerly part of the Massachusetts Institute of Technology, is building the system in collaboration with a group at Stanford University led by machine learning pioneer Andrew Ng.

Dubbed DeepCode, the system has already been used to detect security vulnerabilities such as the Heartbleed Bug in OpenSSL, Brad Gaynor, associate director for Cyber Systems at Draper, told eWEEK in an email interview.

The institute is currently increasing the magnitude of data on which DeepCode makes its decisions by a factor of 1,000, he said.

“DeepCode is a fundamentally new approach to cyber security,” Gaynor said. “The system collects and ingests massive amounts of software, makes this software searchable, indexes the known bugs and security vulnerabilities, and identifies—in new or existing code—matches to any previously identified flaws.”

Researchers have worked for decades to build systems to warn of potential vulnerabilities in software. Commercial systems typically focus on static analysis, where source code is analyzed for known bad patterns, or dynamic analysis, where software execution is observed for signs of defects.

However, such approaches tend to only find known classes of software vulnerabilities and produce a high proportion of false positives.

By using machine learning and pattern analysis techniques, two fundamental areas of artificial intelligence research, researchers hope that DeepCode will learn what good code and bad code looks like, according to Draper. Once the system is trained to recognize vulnerabilities, the researchers will use the system to identify flawed code and recommend repairs.

“Ultimately, the goal of DeepCode is to find all instances of all known software bugs,” Gaynor told eWEEK. “We quantitatively measure the accuracy of our analytics, and will share statistically-meaningful accuracy data as we roll out the initial platform features over the coming months.”

Previously, the team working on DeepCode said it used the same technology to identify subtle attacks in progress by analyzing large volumes of network traffic. In an academic paper published in November, industry and academic researchers were able to use a similar machine-learning system to identify otherwise undetected command-and-control traffic within an enterprise environment.

Ng, with whom Draper is working, is an associate professor at Stanford who also co-founded Coursera, the online learning platform. Ng created Coursera’s popular machine learning course.

Ng worked with Google to create the “Google Brain” project, which used machine learning and thousands of clustered computers to attempt to mimic some aspects of the human mind. Ng is currently chief scientist at Chinese search firm Baidu.

The DeepCode project is funded by the U.S. Air Force Research Laboratory and the Defense Advanced Research Projects Agency (DARPA) as part of the agency’s Mining and Understanding Software Enclaves (MUSE) program.

Draper Laboratory has other contracts with the U.S. government including acting as the attackers, or Red Team, in various simulated cyber-attack exercises to assess federal agencies' system defenses.

Robert Lemos

Robert Lemos

Robert Lemos is an award-winning freelance journalist who has covered information security, cybercrime and technology's impact on society for almost two decades. A former research engineer, he's...