During the past decade, the assertion that the data warehouse is required to be the center of an enterprise data system started to break down in a variety of ways. Reasons were numerous; they included such unwanted results as increasing complexity, loss of speed and agility and increasing costs.
As a result, instead of analytics becoming increasingly self-service-oriented, for the first time the world of analytics was actually going backward, away from the self-service ideal.
Progress hasn’t been completely undone, but compared to where IT was a few years ago, effective and easy-to-navigate self-service has become much more difficult to achieve. As a result, analysts are much more dependent on IT tools than ever before.
This eWEEK Data Point article features industry information from Kelly Stirman, Vice-President of Strategy at Dremio, who thinks it’s worth considering why this backward trend has occurred so that enterprises can understand the best way to use self-service analytics going forward. He gives eWEEK readers the following data points.
Data Point Reason 1: The complexity of building, maintaining, and distributing extracts and data marts.
Just like standard extract, transform, load (ETL) processes, data prep products create another copy of the data. It’s not feasible for IT to have every user create a copy of the data every time they need to analyze it. The result is that data prep tools end up far removed from being self-service. The need for management of the complexity and redundancy becomes overwhelming.
Data Point Reason 2: The performance of desktop tools on server-size datasets.
Another reason for the backward trend in self-service analytics is that many self-service tools currently in use were built to run on desktops or laptops. Therefore, when analysts need to run queries on large datasets, they have to let them run from their desktop for hours or submit them as a batch job, bogging down the entire process of exploration and analysis, which is inherently iterative.
Data Point Reason 3: Data lineage and governance concerns.
Data lineage is the ability to track data from its creation throughout its history, incorporating all the transformations it has undergone over its lifetime. One of the main reasons it has become a requirement for businesses today is because of increased regulations to protect user privacy — especially with the EU’s General Data Protection Regulation (GDPR). While there are merits to protecting data, for analysts, data is more difficult to access. If nothing changes, it will become even more difficult.
Data Point Reason 4: The rise of the data lake and use of big data.
With the rise of big data, data lakes have become widespread, but have caused some problems for analytics because of the scale of data they house, the fact that their performance isn’t interactive, and their lack of schema or schema variability. Particularly onerous is the fact that data lakes are generally only accessible by IT, so an alternative is to try to run a SQL engine on the data lake, but that approach is too slow. To work around the slowness, IT moves subsets of the data into SQL engines and cubes, which offers no iterative exploration and analysis, a fundamental requirement for self-service analytics.
Data Point Reason 5: The rise of JSON as an important repository for business information.
JavaScript Object Notation (JSON) is a popular open-standard file format that uses human-readable text to transmit data objects consisting of attribute–value pairs and array data types. In the context of this article, JSON has been a direct cause of the step back in self-service progress. Tools that were designed for SQL simply don’t handle JSON well. It is very challenging to blend JSON data with other data. Organizations have dealt with JSON by converting it to relational format or by having developers create custom dashboards using web frameworks instead of BI tools. Both strategies reduce self-service by relying on IT.
Data Point Reason 6: The spike in microservices.
Microservices allow for a highly customized user experience. But on the other end is the analyst, who faces a lot of hurdles. It’s a Humpty Dumpty-like situation: Deconstructing data in smaller pieces means that eventually, someone has to put it back together for it to be useful, or the business user can’t put it to work. Integration becomes a crucial challenge. This has radically increased the number of repositories that analysts now have to access, and quite obviously, has made self-service analytics more difficult as data consumers have to wait on IT to put data together before it can be analyzed.
Data Point No. 7: Summary: What’s next?
So given all of this complexity, how do we turn the tide toward self-service analytics? Kelly asserts:
- Companies need a platform that allows them to integrate a variety of data sources, with data in multiple forms, and bring all of that information together in a way that accelerates analytics.
- This type of system also needs to avoid creating extracts of data so that copies don’t proliferate.
- This process has to be operable by analysts themselves, rather than intermediated by IT, empowering analysts to explore and interact with all desired data and iterate on their findings.
If you have a suggestion for an eWEEK Data Point article, email cpreimesberger@eweek.com.