Intel, Kaggle Use March Madness Contest to Teach About Big Data

The companies are hosting a competition where players can leverage data and predictive analytics to fill out their tournament brackets.

Big data is one of the hotter trends in the tech industry. It's a term given to the massive amount of structured and unstructured data that is being generated and the need to find ways to process, store and analyze it to make business decisions based the insights.

Business people tend to have a high-level understanding of big data, but it's not always easy for them to get a handle on how to take such huge amounts of information and leverage it to their advantage, according to Boyd Davis, vice president and general manager of Intel's Datacenter Software Division.

But what a lot of people do understand is sports, and the way statistics can be used in developing insights into how a game is going to go. It doesn't always work out the way people expect—the recent Super Bowl between the Seattle Seahawks and the Denver Broncos is a good example of that. However, many people understand the relationship between the stats—the data—and how a game plays out.

"A lot of people can visualize and internalize sports," Davis told eWEEK.

With that in mind, Davis and other Intel officials saw the upcoming NCAA men's basketball March Madness tournament as an example of a sporting event in which millions of people worldwide pore over mounds of data to fill out their tournament brackets in hopes of winning whatever pools they're in—and maybe win some money in the process.

And in a company called Kaggle—a big data solutions company that runs competitions on its predictive modeling platform—the chip maker found the perfect partner.

The two companies are sponsoring a competition designed to enable people to leverage large amounts of data from previous March Madness tournaments as they fill out the 64-team tournament bracket. Most people use a combination of research, hunches and guesses when negotiating the maze that is the NCAA tournament.

The March Machine Learning Mania competition is a two-step event. In the first step, players can use tools from Intel and Kaggle to create and test the results of their predictive analytics models against the outcomes of the previous five NCAA basketball tournaments. That game began Jan. 7 and is running until March, and has 150 teams competing and the number of entries ranging from one to 61, according to the leader board.

The second stage is using predictive analytics to forecast the results of this year's tournament, and it's run differently than tradition pools, where players usually are tasked with predicting the winner of each matchup. With March Machine Learning Mania, players need to come up with the algorithms that will help them predict the winning percentages for the likelihood of each possible matchup. If the winner of the game between the University of North Carolina and Syracuse plays the winner of the Witchita State-Duke game, then players have to determine the winners of each possible matchup resulting from those two games.

The NCAA in recent years has added four more teams the tournament, growing it from 64 teams to 68. Intel and Kaggle have figured out a way to deal with that.

"Predicting every possible matchup for the 68 teams announced on Selection Sunday gives participants the most time to get their 2014 predictions ready in time," the companies said on an FAQ site. "There is a small 'play-in' round, sometimes called the first round, where the 68 are narrowed to 64. While you are asked to predict these games (and you may be predicting them after they occur), we will not be scoring them."