"Cloudera's not the first and probably not even the best with this kind of technology," said Konstantin Boudnik, director of big data distribution at WANdisco. "There are other technologies such as Spark and Shark that are better. They are out of UC Berkeley and completely open. There is much more interest in the Berkeley projects because anybody can come and participate. Impala is open and on GitHub, but it is dominated by Cloudera engineers. So there is open source and there is open source."
The Spark and Shark technologies caught the attention of Monte Zweben, co-founder and CEO of Splice Machine, a maker of the Splice SQL Engine, which is a SQL-compliant database designed for big data applications.
Zweben said the explosion of data being generated by apps, sites, devices and users has overwhelmed traditional Relational Database Management Systems (RDBMSes). In response, many companies have turned to big data or NoSQL solutions that are highly scalable on commodity hardware. However, these databases come at a big cost—they have very limited SQL support, often causing rewrites of existing apps or BI reports.
Built on the Hadoop stack, the Splice SQL Engine enables application developers to build hyper-personalized Web, mobile and social applications that truly scale while leveraging the ubiquity of SQL tools and skill sets in the marketplace. The Splice SQL Engine also scales to handle business intelligence and analysis, and works turnkey with tools like MicroStrategy and Tableau.
Zweben told eWEEK he considered integrating the Spark and Shark technologies into his solution, which is now in beta.
"The NoSQL community threw out the baby with the bath water," Zweben said. "They got it right with flexible schemas and distributed, auto-sharded architectures, but it was a mistake to discard SQL. The Splice SQL Engine enables companies to get the cost-effective scalability, flexibility and availability their big data, mobile and Web applications require—while capitalizing on the prevalence of the proven SQL tools and experts that are ubiquitous in the industry."
For his part, Ravi Chandran, CTO and co-founder of XtremeData, maker of XtremeData dbX, a massively scalable DBMS for big data warehouses, said that by providing an economical SQL solution that complements Hadoop, dbX accelerates the adoption of massively parallel processing (MPP) SQL solutions and makes it easier for organizations to roll out large-scale data environments and high-performance analytics.
As it runs on commodity hardware, dbX is less expensive than MPP options like Teradata, Exadata and Netezza, but is not as cost-effective as Hadoop.
"What is better depends on the application," said Blue Badge's Brust. "Hadoop runs on cheap commodity hardware and uses commodity storage. MPP databases tend to run on expensive appliances and use expensive enterprise storage. Hadoop can keep scaling as needed, whereas many appliance-based MPP solutions can only scale to what's inside the cabinet. But, SQL/relational—and therefore MPP—has much more ecosystem and skill set availability."