International COVID-19 Data Consortium Leverages EHR Data, Analytics

A consortium of research scientists has pooled their efforts to create a common data model and a shared analytics framework that aims to aggregate information from disparate electronic health records (EHRs) internationally. With the creation of this model – the Consortium for Clinical Characterization of COVID-19 by EHR (4CE) – clinical teams and researchers believe they have a powerful tool available to them to quickly discover trends and provide answers to questions about the virus.

A paper on the effort was recently published in NPJ Digital Medicine. Researchers, from Penn Medicine and elsewhere, noted in their publishing, “Even in an information technology-dominated era, fundamental measurements to guide public health decision-making remain unclear. Knowledge still lags on incidence, prevalence, case-fatality rates, and clinical predictors of disease severity and outcomes. While some of the knowledge gaps relate to the need for further laboratory testing, data that should be widely available in electronic health records (EHRs) have not yet been effectively shared across clinical sites, with public health agencies, or with policy makers.”

For example, they wrote, “Through case studies and series, we have learned that COVID-19 can have multi-organ involvement. A growing literature has identified key markers of cardiac, immune, coagulation, muscle, hepatic, and renal injury and dysfunction, including extensive evidence of myocarditis and cardiac injury associated with severe disease. Laboratory perturbations in lactate dehydrogenase (LDH), C-reactive protein (CRP), and procalcitonin have been described. However, data from larger cohorts, linked to outcomes, remain unavailable.”

They noted that because EHRs are not themselves agile analytic platforms, the researchers have been successfully building upon the open source and free i2b2 (for Informatics for Integrating Biology and the Bedside) toolkit to manage, compute, and share data extracted from EHR. As such, the group has set out to begin to answer some of the clinical and epidemiological questions around COVID-19 through data harmonization, analytics, and visualizations.

The consortium consists of 96 hospitals from around the world and so far, has gathered data on more than 27,000 COVID-19 cases with 187,000 laboratory tests. Previously, because of differences in EHRs, all of this data would not have been able to “talk” to each other in a way necessary for analysis. But with so many sites putting their data into a common data model and making it available to be processed and analyzed, consortium scientists were able to detect trends and patterns of this new virus that were previously invisible, researchers stated.

“For example, laboratory data were standardized from Penn Medicine's electronic health record to Logical Observation Identifiers, Names, and Codes (LOINC) and shared units of measure before analyzing their change over time. These steps were critical to uncovering initial clinical insights,” said Danielle Lee Mowery, Ph.D., Penn Medicine’s chief research information officer and an assistant professor of Informatics. “Notable insights include abnormal trends in D-dimer protein, which is a measure of blood clotting, and C-reactive protein, a measure of inflammation, among COVID-19 patients.”

Like the other sites in the study, clinical data from Penn Medicine was analyzed and provided for the effort. And for future studies with 4CE, PennAI, a free self-service machine learning tool developed at the Institute for Biomedical Informatics, will be available to each member site to power the project.

Among other immediate insights were that liver functions initially presented as typical, but worsened over time as patients were hospitalized. White blood cell counts were also typically normal among patients but only elevated among those with the most serious forms of COVID-19.

“The COVID-19 data warehouse established at Penn Medicine will enable our researchers to access standardized data and generate results which can be replicated at sites around the world,” said John H. Holmes, Ph.D., IBI’s associate director for Medical Informatics and a professor of Informatics in Epidemiology. “This opens the door to local insights about COVID-19 patients from the Philadelphia area while at the same time contributing to the global battle against this infectious disease.”

While the consortium itself is new and addressing a new threat, it is actually the culmination of years of work in health analytics. “Our ability to rapidly respond to a global pandemic was made possible by years of institutional investments in health information technology and biomedical informatics expertise and infrastructure.”, said Moore. “We are seeing the value of electronic health records and artificial intelligence in real-time.”