Companies are forging ahead with digital transformation at an unprecedented rate. A recent survey by Gartner Research found that 49 percent of CIOs are reporting that their enterprises have already changed their business models to scale their digital endeavors or are in the process of doing so.
As companies forge ahead with these transformations, they are infusing data science and machine learning into various business functions. This is not an easy task. A typical enterprise data science project is highly complex and requires deployment of an interdisciplinary team that involves assembling data engineers, developers, data scientists, subject matter experts and individuals with other special skills and knowledge.
Moreover, this talent is scarce and costly. In fact, only a small number of companies have succeeded in building an experienced data science practice. And, while building this team takes time and resources, there is an even larger problem faced by many of these companies: more than 85 percent of big data projects fail.
A number of factors contribute to these failures, including human factors, and challenges with time, skill and impact. In this eWEEK Data Points article, Ryohei Fujimaki, Ph.D., founder and CEO of dotData, a Silicon Valley tech startup focused on data science automation for the enterprise, discusses five key factors that contribute to these failures.
Data Point No. 1: Lack of Resources to Execute Data Science Projects
Data science is an interdisciplinary approach that involves mathematicians, statisticians, data engineering, software engineers, and importantly, subject matter experts. Depending on the size and scope of the project, companies might deploy numerous data engineers, a solution architect, a domain expert, a data scientist (or several), business analysts and perhaps additional resources. Many companies do not have and/or cannot afford to deploy sufficient resources because hiring such talents is becoming increasingly-challenging and also because companies often have many data science projects to execute, all of which take months to complete.
Data Point No. 2: Long Turnaround Time and Upfront Effort Without Visibility into the Potential Value
One of the biggest challenges of data science projects is the big upfront effort required, despite a lack of visibility into the eventual outcome and its business value. The traditional data science process takes months to complete until the outcome can be evaluated. In particular, data and feature engineering process to transform business data into a machine learning ready format takes big amount of iterative efforts. The long turnaround time and substantial upfront efforts associated with this approach often result in project failure after months of investment. As a result, business executives are hesitant to apply more resources.
Data Point No. 3: Misalignment of Technical and Business Expectations
Most data science projects are undertaken to provide important insights to the business team. However, often a project starts without clear alignment between the business and data science teams on the expectations and goals of the project, resulting that the data science team is focused mainly on model accuracy, while the business team is more interested in metrics such as the financial benefits, business insights, or model interpretability. At the end, the business team does not accept the outcomes from the data science team.
Data Point No. 4: Lack of Architectural Consideration for Production, Operationalization
Many data science projects start without consideration for how the developed pipelines will be deployed in production. This occurs because the business pipeline is often managed by the IT team, which doesn’t have insight into the data science process, and the data science team is focused on validating its hypotheses, and doesn’t have an architectural view into production and solution integration. As a result, rather than getting integrated into the pipeline, many data science projects end up as one-time, proof-of-concept exercises that fail to deliver real business impact or causes significant cost-increases to productionalize the projects.
Data Point No. 5: Heavy Dependency on Skills, Experiences of Particular Individuals
Traditional data science heavily relies on skills, experiences and intuitions of experienced individuals. In particular, the data and feature engineering process now is mostly based on manual efforts and intuitions of domain experts and data scientists. Although such talented individuals are precious, the practices relying on these individuals are not sustainable for enterprise companies, given the hiring challenge of such experienced talents. As such, companies need to seek solutions to help democratize data science, enabling more participants with different skill levels to effectively execute on projects.
Data Point No. 6: End-to-end Data Science Automation is a Solution
The pressure to achieve greater ROI from artificial intelligence (AI) and machine-learning (ML) initiatives has pushed more business leaders to seek innovative solutions for their data science pipeline, such as machine learning automation. Choosing a right solution that delivers end-to-end automation of the data science process, including automated data and feature engineering, is the key to success for a data-driven company. Data science automation makes it possible to execute data science processes faster, often in days instead of months, with more transparency, and to deliver minimum viable pipelines that can be improved continuously. As a result, companies can rapidly scale their AI/ML initiatives to drive transformative business changes.