A data mesh is a decentralized approach to data management, in which the data itself remains within the business domain that has collected it. Yet data mesh technology enables this data to be available to qualified users in disparate locations, without needing to move or otherwise download the data from its current location.
Data mesh is clearly not a silo. Indeed, it is central to digital transformation’s effort to distribute data widely. The SQL clients from the entire organization can query it, with a distributed query engine. And on top of this privately-owned coherent business data sits the distributed query engine, which can access and unify it for interoperability, rather than storing it centrally.
In other words, a data mesh democratizes the data. It creates “datasets as a product,” a standardized offering, available for anyone with permission. It’s secure, in compliance with local regulations, and suddenly considerably more scalable.
In short, with data mesh architecture, the business domain user rises to the top of the priority list. This empowers them to own the decisions about what data can and cannot be, freeing them from the costly infrastructure constraints that blocks the organization from accessing the accumulated wisdom of all its data.
Data Mesh Challenges and Potential
Here’s a prediction: by 2025, those of us who live in the ever-churning world of data aggregation, transportation, ETL, storage, business intelligence, and accessibility will look at data mesh much as we look at cloud computing today. It’s a strategy that simultaneously shrunk overhead (time and money), reduced grunt work (maintenance, upgrades, backups), and provided end-user abilities that didn’t exist before.
In short, data mesh is a tech evolution whose technical and business advantages make it both obvious and inevitable. What challenges does this evolving paradigm aim to solve?
Also see: Data Mining Techniques
Too Much Data from Too Many Sources
The tsunami of data pouring in as businesses embrace full digital transformation is staggering. Data points flow in dynamically, on a global level, at a level of granularity never before contemplated.
And while historical financial/operational data has always been used as an analytical tool to drive business decisions by management, we’re now seeing BI providing game-changing insights driven by always-on transactional data for the marketing, sales, and product development teams as well. These customer-facing teams can finally instantly know what’s working and what’s not, based on every single action taken by customers.
It’s an extraordinary power to have, but the amount of data they have to work with is hard to collect, store, query, and manage.
Also see: Data Analytics Trends
Nobody will argue that siloed data is a good thing; hundreds of startup companies have emerged, offering solutions to break open those silos. But while the goal over the past decade has been the unification of data sources into a single repository to yield a “single source of truth,” that repository suddenly – amazingly – feels like yesterday’s strategy. Why? Because it introduces several limitations while that immense single source swells day by day.
A data mesh helps address this by lessening the closed off quality of a silo. A data mesh helps data become available to experts throughout the organization.
Large-Scale Enterprise Data Management
Clearly, large-scale enterprise data management is messy. In particular, it’s a challenge to integrate live, flowing data into static or historical data.
Data transitioning in and out of the data lake from edge sources – and managing its storage once it arrives – is time and resource consuming and very expensive. The bottlenecks get more frequent, and business agility declines.
A single, aggregated collection of data cannot easily comply with data residency and privacy regulation compliance that varies from country to country; data governance is geographically diverse, whereas the hardware is not.
Finally – and often the most painful feature of a bloated data lake – is the reality that query overhead doesn’t scale. As more and more users need to query the same database, add sources, or manipulate what’s there, response times slow. This assumes, of course, that the data lake incorporates true data virtualization to seamlessly allow anyone with permission to connect to any data source or platform, an important concern according to our recent survey.
In short, putting all your eggs in one basket has some appeal, but that’s going to be one heavy basket that’s hard to carry … or to locate the right egg. Enter the data mesh. It maintain the benefits of a centralized, standardized data lake while introducing scalability and access that currently. You can think of it as a “distributed data lake.”
Also see: Best Data Analytics Tools
How Does Data Mesh Empower Users?
A data mesh offers automated, comprehensive, instant analytics at scale. Data scientists – and data consumers with less expertise and training – will now be able to access business data, to conduct their own analysis focused on their own business needs.
This self-service strategy, with its single point of access control, represents for the first time a people-centric plan for data management; a faster and more effective way to get answers without taxing the DevOps team, hoping for their availability. This is a major benefit for data teams.
Zhamak Dehghani, Director of Emerging Technologies at Thoughtworks, who is credited with creating this paradigm in 2019 at an O’Reilly conference (she named it later, when she literally wrote the book on the subject), refers to it as a hybrid: “a decentralized sociotechnical approach — concerned with organizational design and technical architecture.”
Access Drives Insights
The data mesh is also, in a sense, the next phase in the “anyone/anywhere” model that we’ve come to expect from cloud computing and data virtualization.
A business domain’s own applications and access tools are usually designed for its own users and their specific needs. And in an ideal situation, its data is local, so latency is minimal.
But if members of one business unit seek data from another, they are limited by their own frameworks. If they do gain access to that centralized data lake, its remote location (and size, most of which isn’t the business unit’s own data) increases latency.
With a data mesh, it is easier than ever to have systems interact, share their on-site data, and make the results available to a diverse group of business users. These may be completely independent teams (say, HR and R&D) or cross-functional teams with the same goals and often the same data (QA working with Product Management, or Sales working with Marketing). This new effortless transparency promises new levels of productivity.
What are the Three Types of Data Mesh?
As this approach takes hold, keep an eye out for three types, or “flavors,” of data mesh. Most companies will use a combination of these:
- File-based: The data is compiled, packaged, and simply provided as a static file. This is the closest to today’s simple cloud-storage approach, but will exist under the new universal peer-to-peer sharing model.
- Event-driven: No matter what business unit or department, consumers can “sign up” for alerts when data changes in a way that may be meaningful to them. Again, this isn’t rocket science, but it’s only available once this previously siloed data is exposed and accessible across the organization.
- Query-enabled: Clearly the most powerful flavor, any user can submit federated queries spanning multiple databases, creating insights only possible when combining results. This is the Holy Grail that gives end-users new capabilities, and data scientists some stored up vacation time.
Be aware that the data mesh isn’t entirely about the corporate business employee. The end user is going to feel the speed of a response when the data is coming from an optimized, dedicated, distributed source, rather than a massive, multi-purpose one. In return, the clickstream and web data these users provide along the way can be instantly absorbed and processed as a pure feedback loop to improve performance, product features, and ultimately, profits.
Also see: Top Digital Transformation Companies
Empowering Data Democratization
While widespread implementation and adoption aren’t going to happen overnight, many organizations are embracing data mesh architecture to democratize and scale their data.
This move puts responsibility on data teams to become truly autonomous: they will need to ingest and clean data themselves, create ETL pipelines (and maintain them), and manage access control. At the same time, the more they invest in these fully-owned steps, the better results they can expect. And yes, that means a new “warm and fuzzy” era of mutually beneficial sharing as each domain helps the others by simply transforming and offering their data to their community.
I’ll close with a parting thought from Dehghani, as she describes the overarching value of distributed data mesh architecture, with domain-owned data under a centralized access system: “Over the last decades, the technologies that have exceeded in their operational scale have one thing in common: they have minimized the need for coordination and synchronization.”
About the Author:
Ori Reshef is the VP of Products at Varada.