Splice Machine, which makes and markets a Hadoop and Spark powered database, announced it is moving to open source to enhance both adoption of its software as well as the quality of the product.
Using in-memory technology from Spark and scale-out capabilities from Hadoop, the Splice Machine relational database management system (RDBMS) powers new applications and offloads existing Oracle and Teradata databases with 10 to 20 times the performance at one-fourth the cost, said Monte Zweben, co-founder and CEO of Splice machine. The Splice Machine RDBMS runs simultaneous analytical and operational workloads, enabling companies to unlock insights in their big data to make better decisions.
“The reason to open source is clearly a function of today’s marketplace in terms of adoption,” Zweben told eWEEK. “If you’re a proprietary software company, you’re going to be adding a lot more friction to the adoption of your software than any open-source project. So to maximize adoption of our platform, we’re going to go open source.”
However, Zweben notes that the question he asked himself was not whether to go open source, but when. Why now?
“The reason is we’re at that critical tipping point where our product is of sufficient stability, performance and quality, where it’s okay to broaden the group of people contributing to it and building it,” he said.
Zweben noted that moving to an open-source model also is good for customers because open-sourcing provides an insurance policy for large-scale enterprises. That insurance policy means there isn’t a single vendor that they might get locked into, and there isn’t a single company that might get purchased or change direction. In addition, open-sourcing creates a vibrant community of people with skill sets in the platform that customers can turn to for help and even custom development.
“It’s a way to disperse the risk they took in the former model,” he said. Moreover, opening up a software product means there will be people looking at it, improving it and fixing bugs. “It just becomes a better, higher-quality, more secure, more performant platform,” Zweben added. “It’s safety in numbers.”
The move to open source will greatly ease the process of adopting the Splice Machine database. Up until now, whenever somebody showed interest and downloaded the stand-alone edition, they’d have to talk to a Splice Machine salesperson to get a cluster version of the system to test it at scale.
“This puts them more in the mainstream for emerging database platforms, as the expectation for developers is that the code is open, and for IT, that there is less chance of vendor lock-in,” Tony Baer, principal analyst at Ovum, told eWEEK. “That said, the fact that a product is open source doesn’t mean that developers will monkey with the source code—most won’t. But the expectation is that the platform is out in the open for those to inspect if they want to.”
Splice Machine will offer a free, open-source community edition of its database as well as an enterprise edition. There will be certain features that are only available in the enterprise edition. Those features will be operational features. However, the community edition will not be limited in terms of performance or function.
Splice Machine Moving Its Database to Open Source
If you want to test the database at scale at incredible loads, you can do so with the community edition. You can even go live on the community edition, but there are a number of operational features that will be held back for the enterprise edition, like backup features, other security features and authentication features. These are mostly features that operational people in the organization—not the developers—care about. These features are essentially related to governance and reliability, and Splice Machine will charge a per-node price for them.
Meanwhile, the company is actively seeking contributors and thought leaders in database architecture and distributed systems to help guide and support the transition, develop best practices and help shape next-generation features that best suit the open-source community, Zweben said.
“The evolution of Splice Machine from being the first transactional RDBMS on Hadoop, to incorporating Apache Spark as an analytical engine, has been amazing to watch as a member of their Advisory Board,” said Mike Franklin, former chair of the School of Computer Science at the University of California, Berkeley, and incoming chair of Computer Science at the University of Chicago. “Our AmpLab at Berkeley has initiated many open-source projects, including Apache Spark, Apache Mesos and Alluxio—formerly Tachyon. I applaud Splice Machine in taking the leap and joining the open-source community.”
“We are making the transition to open source to build a larger community around Splice Machine,” Zweben said in a statement. “Our whole team is eagerly anticipating the contributions that going open source can enable. We also look forward to being more active within the open-source communities beyond our participation around HBase and Spark.”
Apache Spark is an open-source cluster computing framework used for processing and analyzing big data. Apache HBase, an open-source distributed database modeled after Google’s BigTable, is one of the key building blocks of Splice Machine and is what enables real-time updates on top of the Hadoop Distributed File System (HDFS).
Splice Machine announced its plans to go open source at the Spark Summit 2016 conference, where the company is in beta on its Spark version. Zweben said the Spark version offers an important message to users in that Splice Machine is a dual-engine database that allows users to do both operational workloads and analytical workloads.
“At the Spark Summit, almost everyone there is thinking Spark is just for analytics,” Zweben said. “But now you’ll be able to run an application on Spark, meaning you’ll be able to change data on the fly in real-time using standard data management capabilities with transactional integrity and immediately be able to analyze that using Spark.”
Until now, you could never build an analytical application on Spark, he added. “All you could do was analyze the data you had in your repository,” he said. “So this is an important contribution to the Spark community.”