Johns Hopkins Announces Cloud-Based Platform Dubbed AnVIL
According to a news release from Johns Hopkins University, a team co-led by Michael Schatz, Bloomberg Distinguished Professor of computer science and biology at Johns Hopkins, created a cloud-based platform that gives researchers access to a large genomics database.
The release states that “Known as AnVIL (Genomic Data Science Analysis, Visualization, and Informatics Lab-space), the new platform gives any researcher with an Internet connection access to thousands of analysis tools, patient records, and more than 300,000 genomes. The work, a project of the National Human Genome Institute, appears in Cell Genomics.”
Further, “Typically, genomic analysis starts with researchers downloading massive amounts of data from centralized warehouses to their own data centers, a process that is not only time-consuming, inefficient, and expensive, but also makes collaborating with researchers at other institutions difficult. Genetic risk factors for ailments such as cancer or cardiovascular disease are often very subtle, so researchers must analyze thousands of patients' genomes to discover new associations. The raw data for a single human genome comprises about 40GB, so downloading thousands of genomes to conduct such research can take takes several days to several weeks.”
Additionally, studies that require the integration of data collected at various institutions requires each institution to download its own copy while maintaining patient-data security and this challenge will only become larger in the future when larger studies are being done.
"AnVIL will be transformative for institutions of all sizes, especially smaller institutions that don't have the resources to build their own data centers. It is our hope that AnVIL levels the playing field, so that everyone has equal access to make discoveries," Schatz said in the release.
The release concludes that “Already, the AnVIL team has collected petabytes of data (1 petabyte equals one million GB) from several of the largest NHGRI projects, including hundreds of thousands of genomes from the Genotype-Tissue Expression, Centers for Mendelian Genetics, and Centers for Common Disease Genomics projects, with plans to host many more projects in the near future.”