Cloudera, an enterprise software company that provides Apache Hadoop-based software, support and services, announced the Oct. 24 launch of Impala, a real-time query engine for Hadoop to help companies make use of big data.
The platform allows batch and real-time operations to be performed on any type of data, unstructured and structured, within one scalable system. Hadoop, an open-source software framework that supports data-intensive distributed applications, allows applications to work with thousands of computation-independent computers and petabytes of data.
Impala is an Apache-licensed query engine for data stored in Hadoop Distributed File System (HDFS), a scalable, portable file system written in Java for the Hadoop framework and HBase, a nonrelational, distributed database currently serving several data-driven Websites, such as Facebook’s Messaging Platform. Cloudera Enterprise (Real-time Query) RTQ provides the management and support capabilities needed for Impala.
“Mainstream enterprise adoption of Hadoop will inevitably raise expectations,” Ovum principal analyst Tony Baer said in a statement. “Enterprises have grown accustomed to interactive querying and on-the-spot analytics with their existing data warehousing and BI infrastructures and will expect no less of Hadoop. With a real-time query capability powered by its new Impala engine, Cloudera is striving to level the playing field in performance and accessibility with massively parallel SQL platforms.”
According to a recent Cloudera survey of more than 100 customers, more than 70 percent of enterprises surveyed said they are actively exploring how to extract value from big data as a chief business imperative, and the survey found operational IT efficiency and competitive advantage as the main business drivers for adopting the Hadoop platform. However, 78 percent of customers said they need faster queries on Hadoop.
Impala was designed following a flexible data model so it can work over more complex data than a data warehouse and it offers interactive queries expressed in industry-standard SQL. Data and IT analysts can use the platform to access a range of data types and data volumes from information stored in HDFS or HBase. The travel search site Expedia, for example, manages more than 4 petabytes of data using Cloudera Enterprise, and the addition of Impala allows the company to work on one single platform for big data, rather than disparate systems for archiving, extract, transform and load (ETL) and analytics.
“We have already seen high levels of interest in, and adoption of, Hadoop by enterprises for low-cost storage and transformational processing of large volumes of data, but have argued that for Hadoop to gain more adoption for analytic workloads, we need to see analytic tools taking full advantage of Hadoop’s scalable parallel processing architecture,” Matt Aslett, 451 Research’s data management and analytics specialist, said in a statement. “Enterprise RTQ and Impala look to be a significant step in enabling enterprises to take advantage of existing SQL skills and tools to realize the potential of real-time analytics against large volumes of structured and unstructured data stored in Hadoop.”