Genomics Alliance Approves Five Data-Sharing Standards
At its seventh annual Plenary Meeting in Boston this week, the Global Alliance for Genomics and Health (GA4GH) announced approval of five new standards to enable responsible international genomic data sharing.
The five standards — Crypt4GH, Variation Representation, Phenopackets, Tool Registry Service API, and the Data Security Infrastructure Policy— were developed as part of organization’s five-year strategic plan, announced in 2017. The GA4GH Regulatory and Ethics Work Stream (REWS) has also completed a comprehensive review and update of its existing policy frameworks and guidance documents.
The organization said these standards, which address issues in data security, cloud computing, phenotype and variant data exchange, and the ethical implications of personal data use, serve as a blueprint for a federated network of responsible, secure genomic and health data sharing.
“The newly approved standards and updates are a major milestone in our work under GA4GH Connect, and we anticipate several more standards will be approved in the coming months,” said GA4GH CEO Peter Goodhand, in a statement. “We are also launching an update to the GA4GH Connect roadmap that accelerates our goal of enabling a federated, interoperable network of genomic data tools and resources.”
Here is some detail about the approved standards:
• Crypt4GH is a new standard file container format that allows genomic data to remain secure throughout their lifetime, from initial sequencing to sharing with professionals at external organizations. Currently, researchers securely share genomic and other sensitive data files using industry standard encryption, which keeps sensitive information secure during transfer but does not guarantee proper safeguarding thereafter. To support easy access, a user is likely to store the file on their local hard drive in the decrypted state, rather than repeatedly re-encrypting the sensitive information. However, this could leave the data vulnerable. Crypt4GH overcomes this challenge by ensuring sensitive genomic data files remain encrypted throughout their lifetime. The approach uses envelope encryption, a protocol that is relatively new to research and healthcare but is increasingly common in the data security field because it enhances the security of data transfer and storage.
• Phenopackets are a file format allowing phenotypic information to be represented alongside genotypic and medical information for standard phenotypic data exchange within medical and scientific settings. The Phenopackets standard aims to facilitate communication between the research and clinical genomics communities by creating an ecosystem of interoperable tools and resources that can use phenotypic data with fewer barriers. A phenopacket file contains a set of mandatory and optional fields to share information about a patient or participant’s phenotype, such as clinical diagnosis, age of onset, results from lab tests, and disease severity. It is also able to link to a separate file containing a patient’s genetic sequence, if available. Phenopackets are expected to standardize phenotypic data exchange within the medical and scientific settings. This will allow phenotypic data to flow between clinics, databases, clinical labs, journals, and patient registries in ways currently only feasible for more quantifiable data, like sequence data.
• Variation Representation Specification is an extensible framework of computational models, schemas, and algorithms to precisely and consistently exchange genetic variation data across communities.
• Tool Registry Service API is a standard for exchanging tools and workflows to analyze, read, and manipulate genomic data, allowing genomics researchers to bring algorithms to datasets in disparate cloud environments.
• Data Security Infrastructure Policy is a set of security best practices for standards development and implementation within the context of GA4GH to facilitate the responsible sharing and processing of genomic data.
In addition, the Revised Regulatory and Ethics Policies include updates to the GA4GH Consent Policy and GA4GH Data Privacy and Security Policy, and a reaffirmation of the 2014 Framework for Responsible Sharing of Genomic and Health-Related Data, to ensure they meet the demands of the current era of genomic medicine.