Building the Clinical Data Registry of the Future
What will the clinical data registry of the future look like? Presenters at a Sept. 22 Council of Medical Specialty Societies meeting gave a glimpse of innovations already being deployed.
Bill Wood, M.D., M.P.H., is a professor of medicine at the University of North Carolina at Chapel Hill and a senior medical advisor to the ASH (American Society of Hematology) Research Collaborative. He chairs the ASH Research Collaborative Data Hub Oversight Group.
“The healthcare data ecosystem is evolving rapidly, and registries as we've traditionally thought about them are becoming something much different as part of that evolution, and the CRN [Coordinated Registry Networks] framework can help provide one way to think about what the registry of the future may look like,” Wood said. The ASH Research Collaborative Data Hub is one such approach, he added.
Coordinated Registry Networks bring together real-world data from a variety of sources to support improved medical device evaluation. The CRNs build on national/regional registries, strategically harmonize data elements and link data to comparable data across the systems.
“We adopted this term Data Hub in large part because we were recognizing that we were moving towards this model of a registry of the future,” Wood says. “We saw this wonderful ecosystem of a number of largely device-based groups that have developed their own CRN underneath this overall framework. We are not primarily a device-based group; we work in different hematologic conditions.”
The Data Hub ingests a wide variety of data including electronic medical record data, clinical and laboratory data, genomic or molecular correlates, patient-reported outcomes, and aggregated population data. It uses state-of-the-art technology so that data collection can be automated wherever possible, which minimizes data capture burden.
“The purpose of our Data Hub is to capture data to generate real world evidence for hematology,” Wood said. “We focus on specific hematologic diseases. We've started our work in Sickle Cell Disease and Multiple Myeloma. Within each disease area, we gather together networks of sites that participate by sending patient-level data to help inform both research and collaborative clinical practice improvement within these disease areas. As we do that, we develop a community of stakeholders much like within the CRN framework.”
One of the key sources of data comes from electronic health record data feeds — structured and in some cases, unstructured data depending on the level of consent that's been provided by participants, Wood explained. “We also obtained information directly from filled out electronic case report forms that we essentially verify or curate at the site level with the benefit of pre-populated data from electronic health records.”
The Data Hub also has the ability to map local curated registry data that sites might have. “We work with some of the largest academic medical centers in the country that are seeing patients with hematologic conditions, many of whom already have had some existing efforts that we can actually map into our overall Data Hub structure and then fill in the gaps that we need to complete our data model as needed.”
Wood stressed that they also engage with patients quite a bit. “We have community advisory boards and the ability to develop prospective nested sub-studies, which include the collection of patient-reported outcomes and patient-generated health data. Within this broader multi-stakeholder environment, we're able to purpose this data in several different ways. Under the overall headings of research and collaborative clinical practice improvement, we can develop certain structures like a learning community in which we can bring different sites together along with patients and families and other stakeholders across the policy, regulatory, and scientific landscape to work together toward shared common goals on different hematologic conditions. In doing that, we develop data that are fit for purpose for regulatory and other uses”
He explained how they bring in EHR data: “I referenced the local curated registry data that we can map to our existing structure, which we do by file export,” Wood said. “We also have the ability to bring in EHR data through direct feeds. We have been utilizing either OMOP common data models or FHIR-based data transmission. Registries of the future, under the CRM framework and other frameworks, are looking to leverage the newest technology data transmission standards, as enabled by the 21st Century Cures Act, including the ability to use EHR FHIR APIs. We actually have a number of sites that are choosing that pathway and we anticipate that more and more sites will be using that pathway as we go forward.”
As data come in to the Data Hub, they develop pre-populated electronic case report forms. “We do that through the development of digital phenotypes,” he said. “Ultimately, those case report forms, where needed, can become the source of truth, and they are verified or amended at the site level as needed, and then essentially finalized for inclusion within the Data Hub.”
“In this broader context of registries at the future,” Wood said, “we intend to remain closely aligned with all the important stakeholders throughout the health data landscape, including our colleagues and friends at the FDA. With our data quality program, we're very intentionally closely aligned with the emerging data quality standards that FDA has described in several guidance documents, as well as an ongoing programs to provide structure for real world data transmission from sponsors and technology groups.”
Wood explained the concept of digital phenotyping, or what they call e-phenotyping. “Our intent is to be able to leverage both structured and unstructured EHR data and we can use that information in either rule-based or model-based ways where the intent is to model an underlying health concept. We develop an operational definition based on our best approximation of that health concept. We're able to test it. Like any clinical test, we have test characteristics associated with that sensitivity, specificity, positive predictive value, and so forth. We have thresholds of use to determine whether or not a provisional operational definition can pass that test and or whether it needs additional site-level unification adjudication and verification to meet the threshold that we feel is needed for an accurate and high-quality data element. E-phenotyping is an iterative process. We know that our first attempt out of the box for various type of clinical concepts may or may not hit the mark. We go back and develop a new operational definition, we test again, and we keep going until we get to the limits of the ability that we're able to extract from structured and in some cases, unstructured data.”
Moving forward, Wood said, they will have the ability with the data that they collect to apply new machine learning and artificial intelligence techniques to make their test characteristics even stronger.
As these data come in, they are able to then repurpose them in various ways. “A critical component of our Data Hub is the ability to return value directly back to stakeholders including our site clinicians and investigators and others who can actually look at data in real time through dashboards that we've constructed to help them monitor their patient populations, for a variety of different reasons — it could be for operational purposes or for collaborative clinical practice improvement,” Wood explained. “It could be for hypothesis generation to inform the development or research studies.”
“Within our Sickle Cell Disease program learning community, if we want to increase the reliable use of disease-modifying therapies for individuals affected by Sickle Cell Disease, then we need to see what disease-modifying therapy use looks like at the site level and across sites,” Wood said. “We have a variety of different figures that we've developed for all of the different metrics that we curate and include within our dashboards that are disease-specific and available to sites and in some cases, available to stakeholders through custom-built dashboards as well.”
They also roll up these data into summary metrics and this gives them an opportunity to look at change over time in a tabular format and feeds into their ability to provide real time data quality reports so that sites that can understand data characteristics of their patient population.
“We have similar dashboards that we have developed for our Multiple Myeloma program," Wood added. "Our Data Hub leverages different types of data from different sources to best represent underlying health concepts. Structured data alone are very challenging to use to model important outcomes relevant to Multiple Myeloma practice and research. This is an instance in which some of our electronic case report form data can be used as a gold standard while we're working to develop ever more complex underlying digital phenotypes.”
They also have developed programs under the umbrella of clinical practice improvement. “We've been fortunate to be supported by Office of Minority Health at the Department of Health and Human Services to develop a nationwide Sickle Cell Disease learning community, which so far has been built out at the pilot level at over 10 participating sites. We actually have an in-person meeting for our learning community participating sites coming up just next week, and this is powered by the Data Hub. And that's really the key point here. This is a Data Hub that can directly influence our ability to improve practice at scale — in this case, a reliable use of disease-modifying therapies for Sickle Cell Disease and the reliable use of co-developed pain management plans.”