Today: Dremio (Data-as-a-Service Platform)
Company Description:
Dremio, a privately-held, Santa Clara, Calif.-based company offers a data-as-a-service platform that helps companies get more value from their data, and faster. Dremio’s open source platform helps analysts and data scientists work together to discover, curate and collaborate for diverse analytical use cases. It builds on Apache Arrow to accelerate queries on a range of data sources, from S3, ADLS, and HDFS to NoSQL and relational databases. With Dremio, data can be analyzed via BI tools including Looker, Power BI, Python, Qlik, Spark, SQL, and Tableau among others. Dremio makes data engineers more productive and data consumers more self-sufficient.
Dremio solves the challenge of making data fast and self-service for data consumers, eliminating the creation, management and governance risk associated with unnecessary data copies. Dremio’s DaaS platform includes advanced features for demanding enterprise workloads, including extensive security controls, data lineage, advanced data acceleration features, an elastic deployment model, and connectors for popular data sources.
Also part of Dremio platform is the Gandiva Initiative for Apache Arrow. This execution kernel provides up to 100x greater efficiency on many types of queries and operations. This improved efficiency translates into lower operational costs, better user experience, and the ability to support more workloads with existing hardware. Dremio continues to add support for more popular data sources deployed in customer data centers and cloud services. With 3.0 Dremio now supports Azure Data Lake Store, Elasticsearch 6, AWS S3 GovCloud, and Teradata.
Dremio was founded in 2017 and has raised $45 million in funding. Co-founder and CEO of Dremio is Tomer Shiran.
Markets:
Data analytics, cloud, business intelligence, data virtualization, big data, financial services, internet of things
International Operations: Canada, India
Product and Services:
Dremio is a data-as-a-service platform. It overlaps with some traditional technologies in use today. By integrating disparate offerings in a single, scalable, self-service platform, Dremio creates new types of capabilities that are impossible to build with separate products. Dremio is the only platform that helps to deliver data as a service packaged as an open source, self-service solution.
Dremio’s open source data-as-a-service platform helps analysts and data scientists work together to discover, curate and collaborate for diverse analytical use cases. It builds on Apache Arrow to accelerate queries on a range of data sources, from S3, ADLS, and HDFS to NoSQL and relational databases.
Dremio 3.0 includes advanced features for demanding enterprise workloads, including extensive security controls, data lineage, advanced data acceleration features, a new elastic deployment model, and connectors for popular data sources. Dremio solves the challenge of making data fast and self-service for data consumers, eliminating the creation, management, and governance risk associated with unnecessary data copies.
Key Features:
With these features, Dremio helps companies ensure governed and secure access to data from any source, at the speed of thought, through a self-service experience.
- Best in class SQL performance on massive datasets, based on Apache Arrow. Makes extensive use of patented Data Reflections for accelerating queries on any data source at any scale.
- Built-in data catalog. Dremio’s data catalog provides a powerful and intuitive way for data consumers to discover, organize, describe, and self-serve data from virtually any data source in a governed and secure model. Data stewards can describe and tag datasets. Data consumers can utilize the Google-like search interface to find the data they need and then immediately start curating, blending or analyzing it.
- Advanced security controls. Dremion integrates natively with Apache Ranger for centralized access control, building on the system’s powerful row and column-level access controls that work with any data source and across multiple data sources. In addition, Dremio supports end-to-end TLS encryption. For AWS deployments, Dremio also supports EC2 instance profiles for secure access to S3.
- Multi-tenant workload controls. The new multi-tenant features allow data engineering teams to manage and optimize cluster resources across a variety of workloads and users. Workload management policies can be used to precisely control resource allocation based on user, group membership, time of day, data source, query type, and many other runtime factors.
- Elastic deployments using Kubernetes. Dremio provides an official Docker image and templates for elastic, highly available deployments using the popular Kubernetes orchestration framework. Companies can simplify the management of their deployments on-prem and using popular cloud services like Amazon EKS and Azure AKS using Dremio’s Helm Charts for provisioning and elastically scaling clusters of up to 1000+ nodes.
- Advanced engine for relational push-downs. A declarative engine for relational database sources increases the sophistication of push-downs of SQL expressions, resulting in more efficient processing on popular systems such as Postgres, SQL Server, Oracle, and Teradata.
- New data sources. Dremio continues to add support for more popular data sources deployed in customer data centers and cloud services including Azure Data Lake Store, Elasticsearch 6, AWS S3 GovCloud, and Teradata.
Product Analysis:
This is very little product analysis at this time from reviewers such as Gartner Insights, IT Central Station and G2 Crowd available regarding the Dremio solution. eWEEK will update this as soon as possible.
Analyst and Forbes columnist Dan Woods wrote a deep report in July 2018 about Dremio: “My view is that Dremio and any other product that combines a data catalog, self-service and high speed queries against heterogeneous data sources can pick up where data lakes floundered. Data-as-a-Service systems will essentially implement a data lake 2.0 vision that actually works,” Woods concluded.
Some other highlights of Woods’ research:
- Before getting started, it’s important to note that Dremio isn’t a new repository for your data. It runs between the data sources and the tools that access your data. You do not move your data into Dremio to benefit from its capabilities.
- Dremio has optimized push-down of SQL based queries across many different sources, even those that don’t support SQL (e.g., Elasticsearch, MongoDB, S3)
- Dremio maintains a data catalog of all these sources, making it easy for users to search and find datasets, no matter where they reside physically.
- The catalog contains both physical datasets and virtual datasets that were derived from the physical datasets using SQL queries.
- The catalog also contains the data lineage of all virtual datasets, which documents where the data comes from and how it was transformed along the way.
- Dremio supports security by allowing users and groups of users to be defined in LDAP or Active Directory.
- Row- and column-level access and data masking rules can then be defined so that users can only see authorized data based on their LDAP/AD group membership. Dremio provides this even when the source (such as file systems or NoSQL) doesn’t support row- and column-level controls.
- Access to data is tracked at a user level.
“Dremio is all about expanding the number of people who can get access to data and transform it into the shape needed for analysis,” Woods wrote. “Where Dremio breaks new ground, and makes Data-as-a-Service a meaningful new idea, is in taking responsibility for delivering high-performance queries on heterogeneous sources, in addition to these other capabilities. In some other products, the whole idea is to use the system to allow the right extract of data to be retrieved and used in some way by a single analytics tool or an application. With Dremio, high performance is available to all tools.”
List of current customers: Diageo, Microsoft, Standard Chartered, TransUnion, UBS, VirginOrbit
Delivery: Software (cloud, on-premises)
Pricing: Dremio is available in an open source Community edition as well as a commercial Enterprise Edition. Enterprise Edition subscriptions are priced based on the number of nodes to which Dremio is deployed.
Other key players in this market: Arcadia Data, Alation, AtScale, Denodo, Snowflake.
Contact information for potential customers:
Website: www.dremio.comPhone: 650-383-6805
Email: contact@dremio.com