Close
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
Read Down
Sign in
Close
Welcome!Log into your account
Forgot your password?
Read Down
Password recovery
Recover your password
Close
Search
Logo
Logo
  • Latest News
  • Artificial Intelligence
  • Video
  • Big Data and Analytics
  • Cloud
  • Networking
  • Cybersecurity
  • Applications
  • IT Management
  • Storage
  • Sponsored
  • Mobile
  • Small Business
  • Development
  • Database
  • Servers
  • Android
  • Apple
  • Innovation
  • Blogs
  • PC Hardware
  • Reviews
  • Search Engines
  • Virtualization
More
    Home Cloud
    • Cloud
    • Development
    • IT Management
    • Networking

    AWS Hosts Human Genetics Catalog in the Cloud

    Written by

    Darryl K. Taft
    Published March 29, 2012
    Share
    Facebook
    Twitter
    Linkedin

      eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

      Amazon Web Services (AWS) and the U.S. National Institutes of Health (NIH) announced that the complete 1000 Genomes Project is now available on AWS as a publically available data set.

      AWS and NIH announced the news at the White House Big Data Summit on March 29. The announcement makes the largest collection of human genetics available to researchers worldwide, free of charge. The 1000 Genomes Project is an international research effort coordinated by a consortium of 75 companies and organizations to establish the most detailed catalog of human genetic variation, AWS officials said.

      The project has grown to 200 terabytes of genomic data, including DNA sequenced from more than 1,700 individuals that researchers can now access on AWS for use in disease research. The 1000 Genomes Project aims to include the genomes of more than 2,600 individuals from 26 populations around the world, and the NIH will continue to add the remaining genome samples to the public data set this year.

      The 1000 Genomes Project started out with pilot phases in 2008 that included just a couple terabytes of data, AWS told eWEEK. In 2010, NIH made a small portion of that data available on AWS as a public data set, and due to the positive feedback from scientists, it decided to make the 1000 Genomes Project as it stands today€”at more than 2000TB of data€”fully accessible on AWS. The amount of data produced by the 1000 Genomes Project is unprecedented in biomedical research, NIH officials said. NIH, part of the U.S. Department of Health and Human Services, serves as one of the data coordinators for the 1000 Genomes Project.

      €œPreviously, researchers wanting access to public data sets such as the 1000 Genomes Project had to download them from government data centers to their own systems, or have the data physically shipped to them on discs,€ said Lisa D. Brooks, Ph.D., program director for the Genetic Variation Program, National Human Genome Research Institute, a part of NIH, in a statement. €œThis process took a long time, and that€™s assuming a lab had the bandwidth to download the data and sufficient storage and compute infrastructure to hold and analyze the data once they had it. We are happy that the 1000 Genomes Project data are on AWS to give researchers anywhere in the world a simple way to access the data so they can put the data to work in their research.€

      €œPutting the data in the AWS cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so,€ said Richard Durbin, Ph.D., co-director of the 1000 Genomes Project and joint head of human genetics at the Welcome Trust Sanger Institute, in Hinxton, England.

      AWS said for researchers to download the complete 1000 Genomes Project on their own servers, it would take weeks to months, and that€™s assuming they had the bandwidth to download the data and enough hardware and storage to hold it. To do meaningful analysis on the data, researchers often needed access to very large, high performing compute resources, which cost hundreds of thousands and sometimes millions of dollars, AWS officials said. The NIH was selected as one of the data coordinators for the 1000 Genomes Project, and it wanted to remove this friction and make the data as widely accessible as possible, so researchers can immediately start analyzing and crunching the data, even if they don€™t have the large budgets that are traditionally required for this level of data analytics, AWS said.

      Public Data Sets on AWS provide a centralized repository of public data stored in Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Block Store (Amazon EBS). The data can then be directly accessed from AWS services such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Elastic MapReduce (Amazon EMR), eliminating the need for organizations to move the data in-house and then procure enough technology infrastructure to analyze the data effectively, AWS said.

      For its part, AWS€™ highly scalable compute resources are being used to power big data and high-performance computing applications such as those found in science and research. NASA€™s Jet Propulsion Laboratory, Langone Medical Center at New York University, Unilever, Numerate, Sage Bionetworks and Ion Flux are among the organizations leveraging AWS for scientific discovery and research. AWS is storing the public data sets at no charge to the community. Researchers pay only for the additional AWS resources they need for further processing or analysis of the data.

      €œIt took more than 10 years and billions of dollars to sequence and publish the very first human genome. Recent advances in genome sequencing technology have enabled researchers to tackle projects like the 1000 Genomes by collecting far more data, faster,€ said Deepak Singh, Ph.D. and principal product manager for Amazon Web Services, in a statement.

      €œThis has created a growing need for powerful and instantly available technology infrastructure to analyze that data. We€™re excited to help scientists gain access to this important data set by making it available to anyone with access to the Internet. This means researchers and labs of all sizes and budgets have access to the complete 1000 Genomes Project data and can immediately start analyzing and crunching the data without the investment it would normally require in hardware, facilities and personnel. Researchers can focus on advancing science, not provisioning the resources required for their research.€

      AWS said the 1000 Genomes is a prime example of €œbig data,€ where data sets become so massive that few researchers have access to the compute power in their own data centers to analyze and process the data. Yet, a key point here is that the 1000 Genomes data will be sitting right next to the compute power researchers need to derive value from the data. In a matter of minutes, scientists can spin up as much compute power as they need to crunch the massive data sets. Researcher will only pay for the additional AWS resources needed for further processing or analysis of the data, AWS said.

      For more information about Public Data Sets on AWS go to: http://aws.amazon.com/publicdatasets/.

      Darryl K. Taft
      Darryl K. Taft
      Darryl K. Taft covers the development tools and developer-related issues beat from his office in Baltimore. He has more than 10 years of experience in the business and is always looking for the next scoop. Taft is a member of the Association for Computing Machinery (ACM) and was named 'one of the most active middleware reporters in the world' by The Middleware Co. He also has his own card in the 'Who's Who in Enterprise Java' deck.

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      Get the Free Newsletter!

      Subscribe to Daily Tech Insider for top news, trends & analysis

      MOST POPULAR ARTICLES

      Artificial Intelligence

      9 Best AI 3D Generators You Need...

      Sam Rinko - June 25, 2024 0
      AI 3D Generators are powerful tools for many different industries. Discover the best AI 3D Generators, and learn which is best for your specific use case.
      Read more
      Cloud

      RingCentral Expands Its Collaboration Platform

      Zeus Kerravala - November 22, 2023 0
      RingCentral adds AI-enabled contact center and hybrid event products to its suite of collaboration services.
      Read more
      Artificial Intelligence

      8 Best AI Data Analytics Software &...

      Aminu Abdullahi - January 18, 2024 0
      Learn the top AI data analytics software to use. Compare AI data analytics solutions & features to make the best choice for your business.
      Read more
      Latest News

      Zeus Kerravala on Networking: Multicloud, 5G, and...

      James Maguire - December 16, 2022 0
      I spoke with Zeus Kerravala, industry analyst at ZK Research, about the rapid changes in enterprise networking, as tech advances and digital transformation prompt...
      Read more
      Video

      Datadog President Amit Agarwal on Trends in...

      James Maguire - November 11, 2022 0
      I spoke with Amit Agarwal, President of Datadog, about infrastructure observability, from current trends to key challenges to the future of this rapidly growing...
      Read more
      Logo

      eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site’s focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

      Facebook
      Linkedin
      RSS
      Twitter
      Youtube

      Advertisers

      Advertise with TechnologyAdvice on eWeek and our other IT-focused platforms.

      Advertise with Us

      Menu

      • About eWeek
      • Subscribe to our Newsletter
      • Latest News

      Our Brands

      • Privacy Policy
      • Terms
      • About
      • Contact
      • Advertise
      • Sitemap
      • California – Do Not Sell My Information

      Property of TechnologyAdvice.
      © 2024 TechnologyAdvice. All Rights Reserved

      Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.

      ×