IT Science Case Study: Creating a Data-Driven Enterprise

eWEEK IT SCIENCE RESOURCE PAGE: Red Hat needed to identify a platform that would accelerate collaboration across the global organization -- spanning diverse functions and geographies -- and that would allow users to leverage and share large-scale compute resources.

Here is the latest article in an eWEEK feature series called IT Science, in which we look at what actually happens at the intersection of new-gen IT and legacy systems.

Unless it’s brand new and right off various assembly lines, servers, storage and networking inside every IT system can be considered “legacy.” This is because the iteration of both hardware and software products is speeding up all the time. It’s not unusual for an app-maker, for example, to update and/or patch for security purposes an application a few times a month, or even a week. Some apps are updated daily! Hardware moves a little slower, but manufacturing cycles are also speeding up.

These articles describe new-gen industry solutions. The idea is to look at real-world examples of how new-gen IT products and services are making a difference in production each day. Most of them are success stories, but there will also be others about projects that blew up. We’ll have IT integrators, system consultants, analysts and other experts helping us with these as needed.

Today’s Topic: Finding a New Platform to Create a Data-Driven Business

Name the problem to be solved: In 2017, Red Hat embarked on an initiative to scale data science as an organizational capability that would empower Red Hat to truly become a data-driven business. The best data science outputs come from collaboration, and Red Hat lacked a central environment with enough horsepower to perform model building and data exploration at scale. Not having a platform for collaboration also resulted in modeling projects that weren’t actionable.

Describe the strategy that went into finding the solution: Driven in partnership with the data science community, the team set out to identify a platform that would accelerate collaboration across the global Red Hat organization -- spanning diverse functions and geographies -- and that would allow users to leverage and share large-scale compute resources. Domino’s feature set, capabilities, and usability met Red Hat’s needs in an easily accessible, central, and easy-to-learn platform for data scientists.

List the key components in the solution: Red Hat built a shared environment on the Domino Data Lab data science platform that the team named DAVE (Data Analytics Virtualization Environment). Domino Data Lab allows data scientists to spin up and down compute resources in AWS, share notebooks and code, and perform parallel processing.

As Heidi Lanford, Red Hat’s vice president of Enterprise Data and Analytics explained: “I think of it as a really awesome playground on steroids for data scientists. It gives them a safe space to share ideas, techniques, data sources, as well as their own subject matter knowledge of business questions that they’re trying to solve.”

Describe how the deployment went, how long it took, and if it came off as planned: Our deployment was not as "standard" as we had expected and required redeployment. Domino Data Lab uses Ubuntu as its standard operating system for deployment, but of course we use RHEL (Red Hat Enterprise Linux). After some initial challenges getting the virtual private cloud (VPC) deployed correctly and resource availability, the redeployment went smoothly, and things have been running in production as expected. Customer support has continued to be readily available and able to answer questions.

Describe the result, new efficiencies gained, and what was learned from the project: With Domino Data Lab underpinning Red Hat’s DAVE platform, the internal data science community is collaborating -- both with each other and with business stakeholders -- better and faster than they could before. This means they build and deploy more models with greater quality that drive the business.

By instrumenting its business with models across multiple functional areas, Red Hat is seeing the following kinds of results:

  • more efficient revenue forecasting;
  • more effective sales analytics, for both direct sales and through the channel; and
  • data scientists, who are dispersed across the organization, have more opportunities to work with their peers (birds of a feather) and have a stronger sense of community.

Describe ROI, carbon footprint savings, and staff time savings, if any: Providing an enterprise-grade environment for data science is reducing the time that data scientists used to spend administering their own areas for data exploration and staging -- plus we can see more of the work going on across the company to find connections. “Our investment in Domino has really paid off; it’s led to a return of around 10x in terms of efficiency for our data science community,” Lanford said.

If you have a suggestion for an eWEEK IT Science article, email [email protected].