Automated systems paired with the ability to sift through massive amounts of data have changed numerous industries over the past decade, from delivering search results, to identifying sales trends and optimizing business processes.
Now, a combination of Big Data and cognitive computing is being used to ferret out security flaws in software.
Keeping security vulnerabilities out of today’s software is a complex and multi-pronged effort, requiring developer training, expert systems that can spot certain classes of software bugs, and iterative quality control processes. Yet, computer scientists are now looking for ways to eliminate many of the headaches and tedium of software development to, not only find flaws in programs, but fix them.
Researchers at the Massachusetts Institute of Technology, for example, created a system called Code Phage that can automatically patch software found to contain certain classes of flaws by searching for similar functionality in other programs and grafting it into the recipient software. The system mimics the biological process of horizontal gene transfer, but instead of moving genetic material between cells, Code Phage moves snippets of code between a donor program and the recipient with the vulnerability.
In a paper presented at the Association for Computing Machinery’s Programming Language Design and Implementation conference in June, the team of researchers reported that their system fixed 10 errors in 7 programs, taking from two to 10 minutes for each repair.
“What we are looking for here is an automated way to very quickly patch the bug,” Martin Rinard, a professor of computer science and engineering at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), told eWEEK. “If the needed functionality exists in the world, then we have a good chance to help you out and fix those bugs.”
In March 2015, not-for-profit research and development organization Draper Laboratory announced its effort, DeepCode, which uses big-data analytics to learn the difference between flawed code and good code. The researchers, who teamed with Stanford University on the project, are building on an earlier effort which mimicked processes in the human brain to detect sophisticated threats in network traffic.
Both projects aim to tackle a critical problem in software development: An increasing number of developers—many with little experience with secure programming—are creating the applications on which the world relies, resulting in flawed code.
“Application security is not going away, in fact, it is a huge and growing problem,” said Jothy Rosenberg, associate director of the Cyber Systems Group at Draper Laboratories, told eWEEK in an e-mail interview. “Until we change the fundamental model of computing to address security from the ground up, these problems will persist, and automated tools to identify and eliminate vulnerabilities will be required to mitigate the problem.”
Researchers Look to Bots, Big Data to Fix Software Flaws
In the case of Code Phage, the program operates by taking vulnerabilities identified by a second project, known as DIODE, and then seeking out potential donor code that could fix the issue. Code Phage first identifies potential donor programs by using two inputs—one that triggers an error and one that does not—and attempts the same inputs on a library of other programs.
After further checking the potential donor, Code Phage performs digital surgery, grafting the needed code from the donor to the recipient program. Finally, it validates that the patched code works as expected.
The project currently focuses on fixing code that processes image formats, such as JPEG and PNG files. The short-term goal is to build up a system that has a large population of donors handling the same formats and use them to fix other programs. After that, developers could use the system to seek out the best way to write a piece of code.
“The long-term vision is that don’t ever have to write a piece of code that someone else has written, because we will find it and integrate them all together,” MIT’s Rinard said.
Numerous companies—including security firms Cigital, Coverity, HP Fortify and Veracode—have technology to analyze code and provide developers with a list of possible software flaws. Because the task of finding vulnerabilities in complex software is so difficult, such systems often create false positives, issuing alerts for potential vulnerabilities that may not be a danger.
To reduce the number of false positives, systems often focus on a single class, or a few classes, of software vulnerabilities. MIT’s DIODE project, for example, identifies memory overflow errors that could lead to security issues. With such complexity, a fully automated system to both find and fix vulnerabilities is a tall order, Daniel Meissler, a practice principal with HP Fortify, told eWEEK.
“It could bear fruit there, but they have to worry about garbage-in, garbage-out,” he said. “If you have good inputs, you will get good outputs. But if you submit the wisdom of the crowds, they could potentially make the output not as pure.”
In addition, both DeepCode and Code Phage search other software for solutions to vulnerabilities and that, in and of itself, could pose a problem. Code Phage, for example, does not need access to source code, so the program could use any available binary. But copying code from other—possible copyrighted—programs will likely cause legal problems.
Still, without automating software-security analysis and the fixing of code, developers may never get a handle on the burgeoning problem of software flaws, said Draper’s Rosenberg.
“Today’s systems for finding vulnerabilities are mostly manual and there is almost no automation for vulnerability repair in use today,” he said. “Automation can help keep vulnerabilities out of software.”