Google Cloud to Store Petabytes of Genome Data for Health Researchers

For $25 a year, Google will store a human DNA genome in its cloud, which will enable big data search engines to rapidly compare the genomes to assist genetic researchers.

Cloud Genome Storage B

Google continues to expand its efforts in the medical arena. They have announced a new Genome Cloud that will be used to store thousands of human genome files.

Google will charge $25 per genome annually, which sounds fairly expensive for cloud storage. But considering each genome file contains about 100GB of data, this may seem more reasonable to potential customers.

For the researcher or doctor, the exciting part of this is that not only will sizable databases of patient genomes be available, Google’s search tools will allow very rapid processing of the data, which is expected to lead to meaningful insights into the genetic underpinnings of cancer, for example.

But genome comparisons techniques can also be useful in researching other genetically-related health conditions.

Google already has some major commitments from the scientific community. The National Cancer Institute is copying 2.6 Petabytes of data into the cloud, at a cost of $19 Million. Google has competition for this genome data storage business from The National Cancer Institute intends to store a second copy of their database on an Amazon Web Services cloud.

Geneticist Dr. Stephen Scherer, director of the Centre for Applied Genomics at Toronto's Hospital for Sick Children, has teamed with the group Autism Speaks and Google in a $50 Million effort to identify the possible genetic root causes of autism, which afflicts 1 baby in 68 born in the US. They already have useful findings, in which Dr. Scherer is able to show that autism is actually an umbrella for more than one condition.

"We have new, unpublished data that shows autism is really a collection of different disorders," said Dr. Scherer in an interview with CNBC. "This is so much the case that even in families where siblings have autism, they often have different forms of the condition and therefore need to be treated in a manner specific to their sub-type."

Bob Wright, co-founder of Autism Speaks, told CNBC, "I think this will open up a whole world of autism research. Hopefully, we are going to save 25 years of research in a matter of 18 to 24 months."

Several software vendors are creating tools to help researchers in their task. These include Seven Bridges, Tute Genomics and NextCode Health.

"In as far as impact on medicine goes; the broad storage question is secondary: you can store the raw data on your servers, on ours, or on a cloud like Google's or Amazon's. What matters—and this is the challenge now—is to be able to make sense of the data efficiently and so to store it in a database model that enables you to use it efficiently," said Ed Farmer, vice president of Communications at NextCode, told eWEEK.

"You need to be able to run samples against panels of genetic variants known to be linked to diseases, essentially instantly, since that is the easiest bit," Farmer said. "In the vast majority of cases that doesn't yield causative mutations. So you need then to be able to search according to a range of inheritance patterns, with the ability to leverage all the main public reference datasets, and have algorithms that are effective for detecting de novo variation, which accounts for a large proportion of rare genetic disease," Farmer explained.

"For the common diseases (Alzheimer’s, heart disease, etc.) you need to be able to query whole-genome data from tens of thousands of patients and controls with some dispatch," he said. "Here again, our system has been behind the biggest such studies ever undertaken— in Iceland, where our technology was developed."