Researcher Proposes Using Machine Learning to Improve Network Defense
"These kinds of algorithms are all around us, being used for sales or marketing, to serve us ads, to suggest us products that are similar to the ones we or our friends have bought," he said. "It is my belief that we can use machine learning to parameterize the 'somethings' and the Xs I mentioned before with very little effort on the human side. "We can either use what is called 'unsupervised learning' to try to find patterns in data that could generate new rules and relationships or 'supervised learning', where the humans provide examples of good and bad behavior in their networks and the algorithm can suggest other IPs that are likely to be relevant in the same way." To test and develop new machine learning algorithms, Pinto created the MLSec Project. "The way it is set up is that individuals and companies submit logs extracted from their SIEMs and network equipment and they receive daily automated reports from the algorithms that help them to pinpoint anomalous behavior on their networks," he said. "For now, we are able to report on some specific behaviors on network firewalls, IPS and other perimeter facing security tools, but additional insights will be added as the project evolves."Pinto said that the algorithm he is presenting in his talk extrapolates the potential of a specific event on a log file to be significant or from a potential malicious source based on previous occurrences of malicious activity on the "same neighborhood (netblocks and ASN) of the Internet." "But instead of 'let's block off country X because we know they are bad', we can have much more granularity, and the rules evolve as we see the malicious behavior changing origins over time. This specific implementation could be compared to a blacklist that changes and tunes itself automatically as the threat landscape changes. "It has been trained initially on OSINT [open source intelligence] sources available on the Internet, most prominently the SANS Technology Institute, which kindly let me use their data in bulk," he said. "Without considering companies that already contribute to the service, the algorithm is being fed on average 1.2 million relevant events summarized from over 30 million log entries per day." The conference will take place at Caesars Palace between July 27 and Aug. 1. Pinto's talk is scheduled for Aug. 1.
The service is completely free, but has only been demonstrated to a few companies and individuals so far.