Automated machine learning, or AutoML, was built to address some of the biggest challenges of data science: automating the laborious, iterative steps required in building machine-learning models, eliminating human errors and reducing the time it takes to build production-ready models.
Today AutoML tools are appealing to a broad range of users from data scientists who prefer the increased productivity to business intelligence and data professionals who like the capability to build models without any background in machine learning.
Like every new technology, there is a lot of confusion and ambiguity surrounding the introduction of AutoML. In this eWEEK Data Points article, Ryohei Fujimaki, Ph.D., founder and CEO of dotData, shares what he considers the top five misconceptions about AutoML.
Data Point/Misconception No. 1: Defining AutoML too narrowly.
Traditional AutoML works by selecting the algorithms and building ML models automatically. In the early days of AutoML, the focus was on building and validating models. But the next generation AutoML 2.0 platforms include end-to-end automation and can do much more--from data preparation, feature engineering to building and deploying models in production. These new platforms are helping development teams reduce the time required to build and deploy ML models from months to days. AutoML 2.0 platforms address hundreds of use cases and dramatically accelerate enterprise AI initiatives by making AI/ML development accessible to BI developers and data engineers, while also accelerating the work of data scientists.
Data Point/Misconception No. 2: Confusing feature generation for feature selection.
Feature engineering can imply different things from selecting features once they are manually built to actual feature extraction. In true sense, FE involves exploring features, generating and selecting the best features using relational, transactional, temporal, geo-locational or text data across multiple tables. Traditional AutoML platforms require data science teams to generate features manually, a very time-consuming process that requires a lot of domain knowledge. AutoML 2.0 platforms provide AI-powered FE that enables any user to automatically build the right features, test hypotheses and iterate rapidly. FE automation solves the biggest pain point in data science.
Data Point/Misconception No. 3: Believing raw data can be directly used to build ML models.
Traditional AutoML platforms cannot ingest raw data from enterprise data sources to build ML pipelines. A typical enterprise data architecture includes master data preparation tools designed for data cleansing, formatting and standardization before the data is stored in data lakes and data marts for further analysis. This processed data requires further manipulation that is specific to AI/ML pipelines, including additional table joining and further data prep and cleansing. Traditional AutoML platforms require data engineers to write SQL code and perform manual joins to complete these remaining tasks. AutoML 2.0 platforms, on the other hand, perform automatic data pre-processing to help with profiling, cleansing, missing value imputation and outlier filtering, and help discover complex relationships between tables creating a single flat-file format ready for ML consumption.
Data Point/Misconception No. 4: Too much focus on model accuracy.
There is a perception that model Accuracy is more important than feature transparency and explanation: This depends on the use case and there needs to be a balance between accuracy and interpretability. Many ML platforms and data scientists create complex features that are based on non-linear mathematical transformations.These features, however, cannot be logically explained. Incorporating these types of features leads to a lack of trust and resistance from business stakeholders and, ultimately, project failure. In the case of heavily regulated industries such as financial services, insurance and health care, feature explainability is critical.
Data Point/Misconception No. 5: Assuming AutoML requires a data science background.
Many business intelligence and data professionals think that AutoML is not meant for BI teams and requires a background in algorithms and ML. First-generation AutoML platforms were cumbersome, lacked user experiences for BI developers and provided challenging workflows. Even today, many AutoML platforms are geared toward data scientists and require a strong ML background. AutoML 2.0 has unleashed a revolution by empowering citizen data scientists; BI analysts, data engineers and business users to embark on data science projects without requiring data scientists.
AutoML 2.0 is the secret weapon the BI community can use to build powerful predictive analytics solutions in days, instead of the months typically associated with Augmented Analytics.
If you have a suggestion for an eWEEK Data Points article, email [email protected].