With the human genome now available on Amazon's cloud platform, researchers will spot patterns in genomic sequencing faster and apply them to clinical practice.
With Amazon
Web Services able to host 200 terabytes of genetic data in the cloud, medical
researchers hope to spot the sequences leading to illnesses such as breast
cancer and Parkinson's disease.
At a White
House Big Data Summit on March 29, Amazon and the National Institutes of Health
announced that they will make the
full 1000 Genomes Project available as a free public data
set on the company's Simple Storage Service (S3) and Elastic Block
Store (EBS) services. Researchers can search the data for free from Amazon's
Elastic Compute Cloud (EC2) and Elastic MapReduce (EMR) platforms.
The cloud
database will allow medical researchers to predict the risks of illnesses, such
as diabetes, heart disease, sickle cell anemia and breast cancer.
The
1000 Genomes
Project is an international research effort started in 2008 that
holds anonymized genetic data for more than 1,700 peoplethe largest amount of
genomic material available to researchers, according to Amazon. The genomic
database will hold genomes of 2,600 people from 26 populations, the company
reported.
"The goal
was to build up the world's largest map of human genetic variation," Dr.
Matt Wood, product manager for big data and high-performance computing at
AWS, told
eWEEK.
The 200
terabytes of genetic data in the 1000 Genomes Project is comparable to 16
million file cabinets of text, or more than 30,000 standard DVDs, according to
the White House Office of Science and Technology.
Amazon has
been tracking the progress of the 1000 Genomes Project in a pilot stage and
noticed the difference in the speed of sequencing and the clearer patterns in
genetics that can be traced.
The first
human genome took 13 years to sequence, but with next-generation sequencing
technology, the work can be done in weeks rather than years, Wood noted.
"This is a real quantum leap," he said.
"In
comparison to the pilot data, this data is of real biological importance,"
said Wood
.
One area
researchers will look at is genetic patterns in the BRCA2 gene, which has been
linked to breast and ovarian cancer. Researchers will also search for patterns
in hypertension, vascular conditions and Parkinson's disease, said Wood.
"We're
really allowing researchers to start to look at the genetics that cause
disease," said Wood
. "The
1000 Genome Project and the data that's been made available on Amazon Web
Services are all part of this continual shift of genomics getting closer and
closer to clinical practice."
Low-cost DNA
sequencing will enable genomics to directly impact clinical outcomes, according
to Wood.
"In
addition to providing insight into disease processes for early identification
of risk factors, this new era of genomics is allowing clinicians and
informaticians to work together on individual patient cases to influence
clinical outcomes," Wood explained.
Researchers
can search the genomic data to spot geographic patterns, like those of Chinese
people that live in Denver, or those for people with Mexican or European
ancestry, said Wood.
Scientists
will not only be able to compare genomic data for populations and
subpopulations of humans but also compare the human genome with patterns found
in other species such as the gorilla or duck-billed platypus, said Wood.
In addition to
studying genes that cause disease, researchers will examine genetic information
that can help prevent illnesses and protect individuals, said Wood.
"There's
a lot of scope here to look at both classes of research," he said.
Cloud
computing is an area of growth for big data in health care. "Data sets can
be considered big data when they exceed researchers' experience in dealing with
them," Wood noted.
On Nov. 10,
2011, Dell announced a donation of cloud infrastructure for pediatric cancer
research trials, and IBM announced
Clinical Genomics, a data-analytics platform, on
March 14.