Hadoop Drives Down Costs, Drives Up Usability With SQL Convergence
Boicey said SQL integration was a consideration with Saritor, as the intent in designing the system was to not disturb the current IT environment. "Our EMR sits on an MS SQL database and Sqoop, an Apache Hadoop tool, allows us to extract data from the relational database into HDFS." "As crappy as it is, SQL is a language everybody uses and you're not going to change that," Jim Kaskades, CEO of Infochimps, told eWEEK. Infochimps delivers a cloud service solution for big data that eliminates the struggle to master all the new big data technologies. Infochimps claims its solutions make it faster, easier and less complex to build and manage big data applications and quickly deliver actionable insights. Infochimps offers an elastic Hadoop cloud for massively parallel data analysis. "So much of the world understands SQL and BI. … For crying out loud, you can connect Excel to SQL," said Fred Gallagher, general manager of Vectorwise, which provides an analytical SQL for big data from Actian. Actian enables organizations to transform big data into business value with data management solutions to connect, analyze and take automated action across their business operations. The Vectorwise 3.0 analytic database offers companies high-performance integration for Hadoop with the Vectorwise Hadoop Connector. "The trend toward SQL is the ability for non-engineering-level users to access data from Hadoop," said Lloyd Tabb, CEO and co-founder of Looker, a BI software company with a SQL-based solution. "Traditionally, to access Hadoop you had to be a programmer. But SQL is exactly what it is defined as, a Structured Query Language, and Hadoop vendors are moving toward SQL to make Hadoop more accessible. This lowers the barrier to entry."Last year, Ovum analyst Tony Baer began taking notice of the convergence between Hadoop and SQL. In a blog post from October 2012, Baer wrote, "The Hadoop and SQL worlds are converging, and you're going to be able to perform interactive BI analytics on it. "SQL convergence is the next major battleground for Hadoop," he added. "Hadoop has not until now been for the SQL-minded," Baer said. "The initial path was, find someone to do data exploration inside Hadoop, but once you're ready to do repeatable analysis, ETL [extract, transform, load] it into a SQL data warehouse." Cloudera's Impala came out of customer demand, said Amr Awadallah, founder, CTO and vice president of engineering of the Hadoop distributor. With Impala, you can query data, whether stored in HDFS or Apache HBase—including SELECT, JOIN and aggregate functions—in real time, said Justin Erickson, senior product manager at Cloudera. Furthermore, it uses the same metadata, SQL syntax (Hive SQL), ODBC driver and user interface (Hue Beeswax) as Apache Hive, providing a familiar and unified platform for batch-oriented or real-time queries, Erickson said in a blog post co-authored with Marcel Kornacher, the architect of Cloudera Impala. "We see a high majority of our customers using Impala," which is currently in beta, Erickson told eWEEK. Hadoop is valuable for three primary purposes: scaling systems, cost efficiency and flexibility, he said, adding that providing SQL support enhances the flexibility.
Steve Hillion, chief data officer at Alpine Data Labs, which enables end-to-end analytics on combined data from Hadoop and relational databases, said he understands the need for SQL on Hadoop, as "MapReduce is not the friendliest technology." Alpine, however, looks to get the best performance, "so we get down to the lowest level interface, which is MapReduce," he said.