IBM's SystemML Moves Forward as Apache Incubator Project

By Darryl K. Taft  |  Posted 2015-11-25 Print this article Print
big data

"SystemML not only scales for big data analytics with high performance optimizer technology, but also empowers users to write customized machine learning algorithms using simple domain specific language without learning complicated distributed programming. It is a great extensible complement framework of Spark MLlib. I'm looking forward to seeing this become part of Apache Spark ecosystem," said D.B. Tsai, an Apache Spark and Apache SystemML committer, in a statement.

IBM’s Thomas said data scientists are the primary target audience for SystemML.

“For instance, SAS is a big player in data science and analytics, but you have to bring everything into SAS and you have to know SAS,” he said. “With SystemML, if you know Spark and you know Java or Scala, suddenly you can be proficient in building algorithms at scale using SystemML. This is very different because Spark doesn't limit you to just a few points of data; it can leverage data in Hadoop, or data in a mainframe or data in a warehouse.”

So it empowers a new set of data scientists in organizations that don't have to know a specific tool; they just need to know a programming language and something that's common like Python or Java or Spark. Thus it is targeted at data scientists that want to build more effective algorithms to automate the analytics process.

“It's actually pretty rare to go from open-sourced in June, to available on GitHub in August, to being an incubator project in November,” Thomas said. “That's incredibly fast. I think that's just a demonstration that the community sees the value here of how this will do something different.”

SystemML changes the process of analytics from a manual, build-once mentality to an ongoing way to run machine learning at scale, he said.

“So I believe this will change the face of machine learning,” Thomas said. “I believe that's why it's gotten the momentum that it has, because it gives you scale. You can write once and as new data comes in you can continue to update. It really simplifies the process.”

Meanwhile, there is a good deal of machine learning activity in the open-source software world, said Dave Schubmehl, research director for Cognitive Systems and Content Analytics at IDC. Google recently released its second generation machine learning library, TensorFlow, to open source. Other open source machine learning libraries include Caffe from the University of California, Berkeley, Theano from the University of Montreal and Torch, recently popularized by Facebook. There also are a number of others, including Apache Spark MLlib, which is designed to make machine learning easy and useful inside the popular Apache Spark framework for cluster computing. In addition, Microsoft recently open-sourced its distributed machine learning library, DMTK under an MIT license, he said.


Submit a Comment

Loading Comments...
Manage your Newsletters: Login   Register My Newsletters

Rocket Fuel