Using Big Data to Discover the Truth Isn't as Easy as It Looks
After a few years of testing, Google Flu Trends was withdrawn because it wasn't accurate. Social media alone, it seems, is not a good way to track disease. However, Baker said that it's also important that once you've performed an analysis you should question the output. She pointed out that many sources provide data that's already been analyzed and that there can be mistakes in the analysis. She adds that you need to check your own analysis as well, including by doing the basics such as checking the math. One way to help make sure your data analysis is more accurate is to diversify your sources, which is what the CDC did with its flu reporting. She added that you have to assume that your analysis will fail and you have to be prepared to figure out what to do next. "Data is essential and analytics certainly speed results, but don't assume results to be infallible," Baker said. She recommends running any analysis at least three more times to make sure the analysis is correct.Many government agencies, ranging from the CDC to the Federal Communications Commission, have stores of data that are available for analysis, much of the time simply on request. But depending on the data, it's important to try to confirm the data, just as you would from any other source. Just because it's from the government doesn't mean it's accurate, current or relevant. It's also worth noting that much of the data you may need in business, or in my case in journalism, isn't in a useful form. It may be in files that need to be converted to a format that can be readily analyzed with the available tools, or the data may be in printed reports where it must be entered manually or scanned to be useful. If it looks like using big data may be a lot of trouble, you're right. There's nothing magical about big data, including the fact that it's big. As Baker told me, it's more important to have the right data than it is to have a lot of data. The old adage of "garbage in, garbage out" holds true when it comes to data analysis. Accumulating lots of useless data is still useless, there's just more of it. But once you do find the right data, and analyze it properly, it can show you things that you can't find any other way, and that is what makes this technology so valuable to business managers and journalists.
Fortunately, much of the data you're likely to need for analysis is readily available. The government has a wealth of information and provides a site, Data.gov, where vast amounts of government data can be found and much of it is very useful.