With the human genome now available on Amazon's cloud platform, researchers will spot patterns in genomic sequencing faster and apply them to clinical practice.
Web Services able to host 200 terabytes of genetic data in the cloud, medical
researchers hope to spot the sequences leading to illnesses such as breast
cancer and Parkinson's disease.
At a White
House Big Data Summit on March 29, Amazon and the National Institutes of Health
announced that they will make the full 1000 Genomes Project available as a free public data
on the company's Simple Storage Service (S3) and Elastic Block
Store (EBS) services. Researchers can search the data for free from Amazon's
Elastic Compute Cloud (EC2) and Elastic MapReduce (EMR) platforms.
database will allow medical researchers to predict the risks of illnesses, such
as diabetes, heart disease, sickle cell anemia and breast cancer.
The 1000 Genomes
is an international research effort started in 2008 that
holds anonymized genetic data for more than 1,700 peoplethe largest amount of
genomic material available to researchers, according to Amazon. The genomic
database will hold genomes of 2,600 people from 26 populations, the company
was to build up the world's largest map of human genetic variation," Dr.
Matt Wood, product manager for big data and high-performance computing at
AWS, told eWEEK.
terabytes of genetic data in the 1000 Genomes Project is comparable to 16
million file cabinets of text, or more than 30,000 standard DVDs, according to
the White House Office of Science and Technology.
been tracking the progress of the 1000 Genomes Project in a pilot stage and
noticed the difference in the speed of sequencing and the clearer patterns in
genetics that can be traced.
human genome took 13 years to sequence, but with next-generation sequencing
technology, the work can be done in weeks rather than years, Wood noted.
"This is a real quantum leap," he said.
comparison to the pilot data, this data is of real biological importance,"
researchers will look at is genetic patterns in the BRCA2 gene, which has been
linked to breast and ovarian cancer. Researchers will also search for patterns
in hypertension, vascular conditions and Parkinson's disease, said Wood.
really allowing researchers to start to look at the genetics that cause
disease," said Wood.
1000 Genome Project and the data that's been made available on Amazon Web
Services are all part of this continual shift of genomics getting closer and
closer to clinical practice."
sequencing will enable genomics to directly impact clinical outcomes, according
addition to providing insight into disease processes for early identification
of risk factors, this new era of genomics is allowing clinicians and
informaticians to work together on individual patient cases to influence
clinical outcomes," Wood explained.
can search the genomic data to spot geographic patterns, like those of Chinese
people that live in Denver, or those for people with Mexican or European
ancestry, said Wood.
will not only be able to compare genomic data for populations and
subpopulations of humans but also compare the human genome with patterns found
in other species such as the gorilla or duck-billed platypus, said Wood.
In addition to
studying genes that cause disease, researchers will examine genetic information
that can help prevent illnesses and protect individuals, said Wood.
a lot of scope here to look at both classes of research," he said.
computing is an area of growth for big data in health care. "Data sets can
be considered big data when they exceed researchers' experience in dealing with
them," Wood noted.
On Nov. 10,
2011, Dell announced a donation of cloud infrastructure for pediatric cancer
research trials, and IBM announced Clinical Genomics
, a data-analytics platform, on