Big Data Keynote Focuses on Analyzing Geolocational Data for Insights

Massive data sets improve predictions and the overall quality of data. But adding information from mobile devices and geolocations provides even more valuable insights and predictions.

Big data gives organizations a better understanding of trends and customer behavior, and the ultimate opportunity lies in the ability of using "geolocational data" to perform "space-time-travel" analyses.

Organizations are taking information they've collected and analyzing it to gain better insights in their customers' behavior, but the ultimate opportunity lies in analyzing geolocational data to figure out where people will be at a given time, said Jeff Jonas, an IBM distinguished engineer and chief scientist at IBM Entity Analytics.

Jonas presented the morning keynote for GigaOM's Structure Big Data conference in New York City on March 23. While using "space-time-travel" would be an enormous opportunity, it will unravel secrets and challenge existing notions of privacy, he said. Big data refers to data sets that are too large and awkward to collect, store and analyze using traditional database management tools. The conference featured fireside chats and panels discussing how to successfully capture the data and derive insights, such as trends and patterns.

"Surveillance society is irresistible. And you are doing it," he said during his presentation. He noted the use of location-based services such as FourSquare, free email, and social networking tools such as Twitter and Facebook.

Despite privacy concerns, Jonas was overall very enthusiastic about big data, noting that having large data sets means companies are able to make more accurate predictions. There are lower false negatives and lower false positives, he said. The computing time required to obtain the data also decreases, meaning the enterprise has access to more data, faster, he said.

"Every two days now, we create as much information as we did from the dawn of civilization up until 2003," Jonas quoted Google CEO Eric Schmidt during his presentation. Now no one wants to wait to sift through huge amounts of data to get the "smart answer," he said.

Data is useless unless it is placed in context with other information in order to discover relevance, he said. Comparing data collection to puzzle pieces, there is no way to tell what the individual pieces mean or represent without actually trying put them all together, Jonas said. The information can be incomplete or entirely unrelated. Each new piece of information is identified as being unrelated to anything else, similar to some other pieces of information, or actually connected to another data piece.

Noting that the same thing cannot be in two places at once, including space and time observations, removes ambiguity from collected data, Jonas said. "For example, the last 10 years of address history, taken in context, can tell if a person is the same or not, when digging through billions of rows of data," he said.

New observations can also reverse earlier assumptions and conclusions, he said. However, after hitting a certain amount of data, there is a "tipping point" after which confidence in what the data analysis is revealing improves while the computational effort decreases, according to Jonas.

Anyone who carries around a smartphone or any mobile device with GPS enabled is constantly broadcasting where they are, Jonas said. Cell phones are generating a "staggering amount" of geolocational data, over 600 billion transactions per day in the United States alone, Jonas said.

The data quickly reveals where people spend most of their time and who they spend it with, he said. "Deidentified" does not mean "true anonymization," especially in large data sets, Jonas said. Figuring out who is who is "somewhat trivial," he said.

It is possible to predict with "87 percent certainty" where someone will be at a certain time in the future, he said. A government intelligence service could pre-empt the next mass protest in real time based on geolocation data alone, he said.

A client is experimenting with geolocation logs to track how often and for how long people visit various retail outlets. The analysis of the logs can reveal patterns, such as a decline in store traffic, long before the retailer reports its quarterly earnings, Jonas said.

"One company I met is getting 85 percent of this data in real time-and they're not a telco," he said. "The data is being deidentified, but they know where you spend your time and who you spend it with."

Privacy advocates have been saying for years that users are inadvertently or voluntarily, giving companies large amounts of tracking data. Jonas agreed, noting that if the government were collecting the same information, users would be horrified.