IBM's SystemML Moves Forward as Apache Incubator Project

SystemML, a machine learning algorithm translator from IBM Research, has been accepted as an Apache incubator project.

big data

IBM announced that its machine learning technology –SystemML –has been accepted by the Apache Software Foundation as an Apache Incubator open source project.

SystemML, which came out of IBM Research and is now used in IBM's BigInsights data analytics platform, is a machine learning algorithm translator. With SystemML, developers can build a machine learning model one time and keep reusing it to analyze and make predictions on data in a nearly infinite number of industry-specific scenarios, said Rob Thomas, vice president of development for IBM Analytics.

“This is significant because for the first time we're bringing to this vast community a way to automate the process of analytics,” Thomas told eWEEK. “Analytics is very manual in organizations today. And with SystemML it becomes something that can be automated and run at scale.”

In June, IBM announced a major commitment to Apache Spark and that it was open-sourcing its SystemML machine learning technology to the Spark open source ecosystem.

“In the next several years, all businesses will rely almost exclusively on applications that learn,” Thomas said in a statement. “For developers that are not expert in machine learning, the availability of SystemML as open source technology will help scale learning and widespread development of applications that truly sense, learn, reason and interact with people in new ways. IBM developed SystemML to provide the ability to scale data analysis from a small laptop to large clusters without the need to rewrite the entire codebase. This allows for domain or industry-specific machine learning, providing developers what they need from a base code to customize applications for their enterprise’s need."

Data scientists today face time consuming and difficult challenges when porting their algorithms to production environments. Apache SystemML addresses these challenges by dynamically compiling and optimizing machine learning algorithms in the environments familiar to the data scientist, and automatically porting these algorithms to production environments. By contributing SystemML to the open source community, IBM is helping data scientists iterate faster with the changing needs of the business, and helping data engineers by removing the need to rewrite for varying environments. As a result, more app developers will be able to apply deep intelligence into everything from mobile applications to large mainframe processes.

“The best analogy for this would be to think of it as like a universal translator for languages,” Thomas said. “If you were to go to one country and speak your native language, and no matter what you said it was translated on the fly, and therefore understood by the locals you were talking to, that's essentially what SystemML does for machine learning algorithms.”

The Apache SystemML project has achieved a number of early milestones, including more than 320 patches, plus APIs, data ingestion, optimizations, language and runtime operators, additional algorithms, testing, and documentation. There also have been more than 90 contributions to the Apache Spark project and more than 25 engineers at the IBM Spark Technology Center in San Francisco to make machine learning accessible to the fastest growing community of data science professionals and to various other components of Apache Spark. Additionally, there have been more than 15 contributors from a number of organizations to enhance the capabilities to the core SystemML engine.