Intel, Kaggle Use March Madness Contest to Teach About Big Data

By Jeffrey Burt  |  Posted 2014-02-27

Intel, Kaggle Use March Madness Contest to Teach About Big Data

Big data is one of the hotter trends in the tech industry. It's a term given to the massive amount of structured and unstructured data that is being generated and the need to find ways to process, store and analyze it to make business decisions based the insights.

Business people tend to have a high-level understanding of big data, but it's not always easy for them to get a handle on how to take such huge amounts of information and leverage it to their advantage, according to Boyd Davis, vice president and general manager of Intel's Datacenter Software Division.

But what a lot of people do understand is sports, and the way statistics can be used in developing insights into how a game is going to go. It doesn't always work out the way people expect—the recent Super Bowl between the Seattle Seahawks and the Denver Broncos is a good example of that. However, many people understand the relationship between the stats—the data—and how a game plays out.

"A lot of people can visualize and internalize sports," Davis told eWEEK.

With that in mind, Davis and other Intel officials saw the upcoming NCAA men's basketball March Madness tournament as an example of a sporting event in which millions of people worldwide pore over mounds of data to fill out their tournament brackets in hopes of winning whatever pools they're in—and maybe win some money in the process.

And in a company called Kaggle—a big data solutions company that runs competitions on its predictive modeling platform—the chip maker found the perfect partner.

The two companies are sponsoring a competition designed to enable people to leverage large amounts of data from previous March Madness tournaments as they fill out the 64-team tournament bracket. Most people use a combination of research, hunches and guesses when negotiating the maze that is the NCAA tournament.

The March Machine Learning Mania competition is a two-step event. In the first step, players can use tools from Intel and Kaggle to create and test the results of their predictive analytics models against the outcomes of the previous five NCAA basketball tournaments. That game began Jan.  7 and is running until March, and has 150 teams competing and the number of entries ranging from one to 61, according to the leader board.

The second stage is using predictive analytics to forecast the results of this year's tournament, and it's run differently than tradition pools, where players usually are tasked with predicting the winner of each matchup. With March Machine Learning Mania, players need to come up with the algorithms that will help them predict the winning percentages for the likelihood of each possible matchup. If the winner of the game between the University of North Carolina and Syracuse plays the winner of the Witchita State-Duke game, then players have to determine the winners of each possible matchup resulting from those two games.

The NCAA in recent years has added four more teams the tournament, growing it from 64 teams to 68. Intel and Kaggle have figured out a way to deal with that.

"Predicting every possible matchup for the 68 teams announced on Selection Sunday gives participants the most time to get their 2014 predictions ready in time," the companies said on an FAQ site. "There is a small 'play-in' round, sometimes called the first round, where the 68 are narrowed to 64. While you are asked to predict these games (and you may be predicting them after they occur), we will not be scoring them."

Intel, Kaggle Use March Madness Contest to Teach About Big Data

To make it even more enticing, the vendors are offering a $15,000 cash prize to the winner. People don't have to play the first step in the contest (predicting results against the previous five tournaments' numbers) to compete in the second (predicting the 2014 results). But they do have to make their entries by March 19.

While upfront goal of the competition is to have fun and let at least one person win some money, Intel's Davis and Will Cukierski, a data scientist with Kaggle, said they also want to give people an idea of how big data analytics works and what can be gained by using analytics tools. The amount of data that is being generated will only grow, as more devices connect to the Internet, thanks to such trends as the Internet of things (IoT). Officials with Cisco Systems are predicting that by 2020, there will 50 billion connected systems, from mobile devices and automobiles to manufacturing machines and surveillance cameras, and all those connected devices will generating huge amounts of data.

For businesses, it will be important for people to know how to gather, store, process and analyze that data in order to make informed business decisions, Davis said. It's also important to tech companies, given the growth potential in big data. IDC analysts predict that big data spending will grow by 30 percent this year, to more than $14 billion.

Intel and other companies need to get people using the technology at their disposal and start seeing what's possible, he said.

"It is very important for us to use sports and to show them that with the capabilities of … technologies and tools, they can do things in a different way," Davis said. "We're using sports as the platform to get people learning about the capabilities of big data."

Kaggle's Cukierski said using sports makes sense.

"People love to play with sports data, and are usually willing to put up with statistics if it deals with sports," he told eWEEK.

Rocket Fuel