MapR Technologies, provider of a popular distribution for Apache Hadoop, this week announced the availability of Apache Drill 1.2 in its distribution as well as a new Data Exploration Quick Start Solution.
The addition of Drill enables users to more quickly glean business insights from all their data in Hadoop and other sources. In addition, MapR released a comprehensive SQL-based test framework to the open-source community.
Interest in and adoption of Drill continue to grow since its general availability earlier this year. Thousands of users have downloaded Drill, and numerous organizations have it in production, interactively analyzing up to petabytes of data, MapR officials said. In addition, more than 4,000 analysts, business intelligence (BI) architects and developers have completed Drill training courses provided by the free Hadoop On-Demand Training program from MapR, the company said.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Drill is the open-source version of Google’s Dremel system, which is available as an infrastructure service called Google BigQuery.
“Capturing and analyzing digital and social media data continues to be highly valuable when engaging with customers,” said Donna Belanger, head of partner tools at marketing services provider Harte Hanks, in a statement. “However, data formats vary greatly and the amount of information grows at an extremely fast pace. Without the need for complex data modeling, Apache Drill simplifies the process and enables us to shorten the time it takes to explore these semi-structured and structured data sources for our clients and help them rapidly identify actionable insights.”
Version 1.2 of Apache Drill, which is now available in the MapR distribution, offers extended SQL analytics functionality, superior performance, deeper Hive integration and improvements in overall enterprise manageability. Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis.
Drill 1.2 helps organizations reuse existing investments in business intelligence and analytics tools, with the addition of SQL-compliant analytical and window functions. New functions include Lead, Lag, First Value and Last Value, in addition to the ranking and a variety of aggregated window functions delivered in Drill 1.1.
Drill 1.2 also delivers better performance and scale for interactive workloads. Some of the capabilities include a new metadata cache mechanism that speeds up queries against thousands of files and enhanced pushdown features for a variety of data types to enable faster queries on HBase and MapR-DB.
Additionally, Drill 1.2 extends its compatibility and performance with Hive. With deeper integration, seamless deployment of Hive for Extract, Transform and Load (ETL) and Drill for interactive queries can take place simultaneously in the same cluster, enabling companies to leverage existing investments in Hive with Drill.
“AnswerRocket empowers users with search-driven data discovery and analytics,” said Alon Goren, CEO of AnswerRocket, in a statement. “When we heard about Drill’s ability to interact with NoSQL file systems, we integrated AnswerRocket with Drill and were truly impressed with the SQL Implementation, speed, and scalability provided by Drill. When combined with AnswerRocket’s ability to translate natural language questions into SQL, Apache Drill makes self-service analytics much more easily accessible to enterprises leveraging big data.”
MapR also announced a new Data Exploration Quick Start Solution, which enables companies to rapidly deploy self-service analytics on big data and discover new business insights faster.
Meanwhile, MapR released a new SQL test framework to the open-source community. This framework had had more than 10,000 tests performed on it over the past several months and is now available for developers in the community to continue to maintain the enterprise quality of the Apache Drill project and accelerate community-driven innovation, MapR said.
“Releasing the test frameworks demonstrates our continued commitment in building a strong community to drive the innovation and quality of the Apache Drill OSS project,” said Neeraja Rentachintala, director of product management at MapR Technologies. “Drill users are getting value from their relational structured data in Hadoop as well as enabling a broader set of users in an organization to leverage new types of semi-structured data sources such as JSON. As the only schema-free SQL engine for big data, Drill brings unprecedented flexibility and performance, rapid time to insights, granular security, scale in all dimensions and integration with existing tools.”