Knowingly or unknowingly, every company here in the 21st century is a data-driven company—whether this means a corporation with data centers full of documents, logs and images, or a small business using a simple spreadsheet of customers, suppliers and partners inside a laptop.
For enterprises that range across industries and geographies, the ability to use data as a core asset is crucial to enable continuous innovation, to avoid falling behind and to establish market leadership.
Yet despite data’s indisputable importance, many organizations continue to struggle with even the basics of data. Most so far haven’t solved the way data and services ought to be delivered to unlock new insights, technologies and operational efficiencies. Even in Fortune 500 companies, it often takes days, weeks or even months to move data to the right environments in order to be used. Then, those organizations must also consider the security, privacy and regulatory compliance elements of data.
Why data management needs an agile development approach
To overcome these challenges and remain innovative, data management must become as important to master as software development has been. After all, in the digital era, agile data has the power to fuel competitive advantages and better inform decisions, drive new application features and improve customers’ experiences.
DataOps, an approach created about five years ago, is an agile development practice that brings together the existing DevOps teams with data engineers and data scientists to support all companies that are data-focused. DataOps can provide organizations with real-time data insights, allowing every team to collectively work toward a common goal. It also brings together formerly disparate teams on the same development page, so that work isn’t duplicated and precious time isn’t lost.
Several companies now offer top-flight DataOps management platforms, including Delphix (often credited with creating the category), IBM, Hitachi Vantara, Atlan and several others.
This eWEEK Data Points article features industry information from Delphix explainer Matthew Yeh, who offers nine best practices to overcome the cost, complexity and risk of managing data to meet the demands of modern business in the new, data-driven world.
Data Point No. 1: Automate your data-related processes.
Automation has become a core tenet of the modern technology playbook. However, in most enterprises automation hasn’t been extended to the data tier, resulting in a significant lag. Common data workflows such as provisioning a new copy of data to developers, business analysts or data scientists are both still highly manual processes and often involve multiple handoffs between people and teams.
DataOps can automate data provisioning through self-service, including tools such as automated data masking and obfuscation, to fully integrate data into workflows for added protection and without sacrificing speed.
Data Point No. 2: Data is for more than just analytics.
Data can and must be used to drive business insights and improve decision-making, but it’s also the raw fuel that teams need to develop and test new applications. Most businesses today need fast, repeatable ways to get high-quality data from “point A” to “point B” in a way that is secure.
In particular, application development drives new, unique requirements around data. In order for data to be effective for development, developers must be able to “treat data as code.” This includes developer-friendly semantics and workflows such as secure, self-service access and provisioning of data in its original form (as opposed to transformed or subsetted). Data for app dev must also work within the modern software development lifecycle (or SDLC). It must be integrated into the toolchains in all the contexts in which the modern SDLC operates—including across clouds, data sources, compliance and regulatory policies and more.
Data Point No. 3: Don’t settle for less than production.
Data is the lifeblood of software development. Yet, too often, companies, developers and analysts use datasets that don’t match production data, which can be counterproductive. Without access to strong, representative or up-to-date data, users must make do with stale, partial or synthetic data that leads to mistakes, errors and more.
It can be both nerve-wracking and dangerous to leave production-grade data sitting unmasked in a database. So how do companies gain access to production-quality data fast without opening themselves up to security risk? DataOps can automate the delivery of masked data to make it more secure and managed self-service in minutes.
It isn’t always necessary to use production data application development or testing, but speed shouldn’t be the determining factor in the type of dataset chosen. Strong or high-fidelity data that matches production leads to better insights and makes for higher-quality test data, and that should always be the first concern.
Data Point No. 4: Mind your non-production environments (it’s 10x the size of your production data).
When enterprise teams think about data, they usually consider those production environments where data is used to build and test applications. For every copy of production data, however, the typical enterprise has at least 10 copies of analytics, reporting, development, testing and backups sitting in non-production environments. It consumes much more storage and IT resources, and while many more people can touch or access it, it’s often less scrutinized from a security perspective.
Processes and technology must account for the volume and importance of non-production data environments in your system. That includes ways to catalog and keep track of all this non-production data, ways to govern access to it through a standard process and the ability to identify sensitive info that resides within to bring these environments into compliance from a policy perspective. DataOps technologies are able to help IT teams solve the challenges associated with non-production environments.
Data Point No. 5: Understand your sensitive and personal data exposure.
As breaches continue to make headlines amid a rising sea of stringent data privacy laws, the cyber-security and user privacy landscapes are becoming more complex to navigate. Growing calls from consumers and regulators alike demand that teams understand what kind of sensitive data is being maintained, who can access it and where it resides (much of which is often located in non-production environments). IT teams today require an enterprisewide view of this that extends across the organization, both the physical or digital location (on-premises or in the cloud) and the different types of data sources.
Data Point No. 6: Data is useless if it’s not integrated.
All the data an organization needs rarely lives in one place. Enterprises today need an approach to harmonize and bring together data across many sources. For example, if a bank is building a new mobile app for customers or has teams running analytics on customer trends, the teams likely will need to pull together data across many departments (such as the member portal, mortgage systems, credit card info, etc.). This data must be integrated as it resides across different systems, locations and clouds.
In addition to compiling data from disparate sources, one aspect of the data integration process that often goes overlooked is synchronizing data from a “temporal” perspective, meaning whether the data originates from the same point in time. Doing so is also an essential ingredient for driving more reliable and accurate insights and improved application testing.
Data Point No. 7: Consider a platform approach to managing data.
Given the array of different sources, locations, deployment patterns and capabilities enterprises need to manage and consume data, consider a data platform over individual point solutions for further integration and improved efficiency. The variety of data sources such as SQL, NoSQL, PaaS, etc., is only increasing as enterprises need to manage data across new locations (such as on-prem, cloud, hybrid and multicloud environments) and new capabilities (data provisioning, security, integration and access control). This all adds up to data being more massive, complex and difficult to manage.
DataOps platforms provide the flexibility to manage the breadth of data that enterprises routinely maintain with a common, efficient approach as opposed to having to adopt many point solutions that address a narrow set of challenges.
Data Point No. 8: Think hybrid cloud.
In practice, the vast majority of enterprises today operate in a “hybrid” fashion, which means they’re using the cloud, but some critical apps will still remain in on-premises systems. Even the most conservative, slow-moving enterprises today likely have cloud-centric capabilities.
Modern enterprises must be able to bridge these worlds to fluidly move data across environments and develop solutions that are agnostic to where the data lives. The right DataOps technology can deliver secure data from any database in both cloud and hybrid environments to eliminate test data wait times and create continuity across data sources.
Data Point No. 9: Data distribution speed is a differentiator.
In most enterprises today, it takes days, weeks or sometimes months to deliver data to developers, testers, analysts and data scientists. This means cutting delivery times down to minutes can be a true differentiator and drive business value. With improved data competency and speed, enterprises can react faster to market changes and customer behaviors, use fresh data for more accurate insights and eliminate an important bottleneck for data-hungry development teams that are building the next wave of innovative apps.
If you have a suggestion for an eWEEK Data Points article, email cpreimesberger@eweek.com.