Data science is the current powerhouse for organizations, turning mountains of data into actionable business insights that impact every part of the business, including customer experience, revenue, operations, risk management and other functions. Data science has the potential to dramatically accelerate digital transformation initiatives, delivering greater performance and advantages over the competition.
However, not all data science platforms and methodologies are created equal. The ability to use data science to make predictions and take decisions that optimize business outcome requires transparency and accountability. There are several underlying factors such as trust, having confidence in the prediction and understanding how the technology works, but fundamentally it comes down to whether the platform uses a black-box or white-box model approach.
Black-box testing or processing is a method in which the internal structure/design/implementation of the item being tested is not known to the tester. White-box testing or processing is a method in which the internal structure/design/implementation of the item being tested is known to the tester.
Once the industry standard, black-box-type machine-learning projects tended to offer high degrees of accuracy, but they also generated minimal actionable insights and resulted in a lack of accountability in the data-driven decision-making process.
On the other hand, white-box models offer accuracy while also clearly explaining how they behave, how they produce predictions and what the influencing variables are. White-box models are preferred in many enterprise use cases because of their transparent “inner-working” modeling process and easily interpretable behavior.
Today, with the advent of autoML 2.0 platforms, a white-box model approach is becoming a trend for data science projects. In this eWEEK Data Points article, Ryohei Fujimaki, Ph.D. and founder and CEO of dotData, discusses five key factors why white-box data science models are superior to black-box models for deriving business value from data science. DotData is a provider of full-cycle data science automation.
Data Point No. 1: The machine-learning modeling process must be transparent.
It is important for both analytics and business teams to understand the varying levels of transparency and their relevance to the machine learning process. Linear and decision/regression tree models are fairly transparent in how they generate predictions. However, deep learning (deep neural network), boosting and random forest models are highly non-linear and difficult to explain for black-box models. While black-box models can have a slight edge in accuracy scores, white-box models offer far more business insights that are critical for enterprise data science projects. White-box transparency means that the exact logic and behavior needed to arrive at a final outcome is easily determined and understandable.
Data Point No. 2: Features have to be interpretable.
Data scientists obviously are math-oriented and tend to create complex features that might be highly correlated with the prediction target. For example, consider the following feature vector for customer analytics: “log(age) * square-root(family income) / exp(height).” One will not be able to easily explain its logical meaning from the viewpoint of customer behaviors. In addition, deep learning (neural networks) computationally generates features. It is not possible to understand such deep non-linear transformations; thus, incorporating this type of feature will make the model a black box. In today’s regulatory environment, the need to explain the key variables driving business decisions is important. White-box models can fulfill this need and thus are gaining in popularity.
Data Point No. 3: Insights must be actionable for model consumers and business users.
Model consumers are using ML models on a daily basis and need to understand how and why a model made a particular prediction, to better plan how to respond to each prediction. Understanding how a score has been derived and what features contributed allows consumers to optimize their operations. For example, a black-box model may indicate that “Customer A is likely to churn within 30 days with a probability of 73.5%.” Without a stated reason for the churn, a business user will not have enough information to determine if the prediction is reasonable. In contrast, white-box models typically give a different type of answer, such as, “Customer A is likely to churn next month because Customer A contacted the customer service center five times last month and usage decreased by 25% in the past four months.” Having the specific reasoning behind the prediction makes it much easier to determine the validity of the prediction, as well as what action should be taken in response.
Data Point No. 4: Models must be explainable.
In enterprise data-science projects, data scientists and model developers have to explain how their models behave, the stability and the key variables that are driving the prediction model. Therefore, explainability is absolutely critical for model acceptance. White-box models produce prediction results alongside influencing variables, making predictions fully explainable. This is especially precarious in situations where a model is used to support a high-profile business decision or to replace an existing model. Model developers have to defend their models and justify model-based decisions to other business stakeholders.
Data Point No. 5: Accountability is critical.
As more organizations adopt data science into their business process, there are increasing concerns about accountability and decisions made based on information that is personal and can sometimes be interpreted as discriminatory. As they provide increased transparency and explainability, white-box models help organizations stay accountable for their data-driven decisions and maintain compliance with the law and any potential legal audits. In contrast, black-box models exacerbate this issue, where less is known about the influencing variables that are actually driving final decisions.
If you have a suggestion for an eWEEK Data Points article, email cpreimesberger@eweek.com.