An ordinary tweet about a case of the flu just might become a part of scientific research. Computer scientists at Johns Hopkins University are using Twitter to track the number of flu cases in the United States.
The researchers have developed a way to use machine learning to track flu cases in real time and categorize the tweets as either simply chatter about the flu or reports of actual infections.
“Our system learns that phrases like ‘at home with the flu’ indicate a real infection while ‘tired of hearing about the flu’ is irrelevant chatter,” Michael Paul, a doctoral candidate in Johns Hopkins’ computer science department and a member of the research team, told eWEEK in an email.
The machine-learning method of categorizing the tweets more closely matches government flu data than searching for all tweets mentioning “flu,” said Paul.
“We wanted to separate hype about the flu from messages from people who truly become ill,” Dr. Mark Dredze, an assistant research professor in the Department of Computer Science and a research scientist at the Johns Hopkins Human Language Technology Center of Excellence, said in a post on the school’s news site, HUB.
Johns Hopkins’ computer algorithms create statistics on tweets using human language-processing technologies. With Twitter, the school’s method produced real-time results faster than that of the U.S. Centers for Disease Control and Prevention, according to the post, written by Phil Sneiderman, a Johns Hopkins spokesman.
“We have built a system that relies on dozens of computers to process the terabytes of data that we collect, which is able to perform the automatic categorization of tweets and estimate the influenza rate on a daily basis,” said Paul.
The CDC’s process for recording flu-related symptoms from hospital visits takes two weeks to report data, according to Sneiderman.
A CDC map for the week ending Jan. 19 showed a widespread influenza activity in nearly every U.S. state.
“In late December the news media picked up on the flu epidemic, causing a somewhat spurious rise in the rate produced by our Twitter system,” said Dredze. “But our new algorithm handles this effect much better than other systems, ignoring the spurious spike in tweets.”
Researchers analyze 5,000 public tweets per minute and download 8 million tweets a day for analysis, Johns Hopkins reported.
Twitter discussion about Kobe Bryant’s flu-like symptoms wouldn’t qualify as an actual flu case, according to David Broniatowski, a School of Medicine postdoctoral fellow in the Department of Emergency Medicine’s Center for Advanced Modeling in the Social, Behavioral, and Health Sciences.
“A recent spike in Twitter flu activity was caused by discussions about basketball legend Kobe Bryant’s flu-like symptoms during a recent game,” Broniatowski told HUB. “Mr. Bryant’s health notwithstanding, such tweets do very little to help public health officials prepare our nation for the next big outbreak.”
Maps the researchers have produced show a sharp difference between last year’s flu season and that of the winter of 2012-2013. Johns Hopkins plans to share its Twitter flu-tracking method with government health agencies.
User names and gender information were removed before the data was entered into its Twitter flu analysis system, the school reported.
By analyzing tweets, researchers were able to gain insight into how people self-medicate when they have the flu, according to Paul. Health officials could potentially use the data to see how people incorrectly take antibiotics to treat the flu, which is a virus, Paul noted.
“An important part of public health is really figuring out your population, and Twitter opens up a whole new way to view that population and understand what’s going on,” Dredze said in a video produced by Twitter. “This really could change how we do public health in this country and how we get feedback from our population.”