Q&A: Informaticist Philip Payne on Washington U.’s Precision Medicine Journey

Philip Payne, Ph.D., wears a lot of hats at the Washington University School of Medicine in St. Louis. In addition to leading the Institute for Informatics, he is associate dean of Health Information and Data Science and chief data scientist. He also has been involved in informatics efforts related to the COVID-19 pandemic. In a recent e-mail Q&A with Healthcare Innovation, Payne described some of Washington University’s work in the area of personalized medicine.

Healthcare Innovation: Can you describe the ARCH Personalized Medicine Initiative, a joint venture between the Washington University School of Medicine and Centene? What are some of its goals?

Payne: Washington University School of Medicine has a strategic focus in both research and clinical practice that will advance precision medicine. That means we need to better understand at a biomolecular and a clinical and a population level the features of our patients that both contribute to wellness but also disease and how patients respond to therapy so that we can use that increased understanding to make better decisions at an individual patient level that optimize quality, safety and outcomes of care. We have a variety of collaborators that we work with that help support this research, including traditional funding sources such as the National Institutes of Health. But in equal measure, we also have collaborations with organizations such as Centene and others that are investing in precision medicine research in order to improve the health and wellness of patient communities.

What are some of the elements of informatics infrastructure that underpin this personalized medicine approach?

Fundamentally, the challenge that we have with personalized medicine or precision medicine is that rather than treating patients as a function of how the average patient presents or the average patient may respond to therapy, we instead want to understand the individual features of each unique patient that contribute to both wellness and disease, and response to therapy. That means we need a lot more data to understand the patients that we have seen historically or that might be participating in clinical studies now, so that we can build the evidence base that informs that very tailored approach.

A lot of the work that we do in informatics is in the context of what we refer to as deep phenotyping. For example, how do we extract all of this critical information from the electronic health record from a variety of biomolecular instruments such as those that we use to genotype or sequence patients, not to mention patient-generated data or data that may help us understand the environments in which patients live or social determinants of health and disease? We need to be able to identify all those data, connect all those data to one another and then understand them in that multi-scale context, which is very complex from a computational standpoint. And so that's really what we do in the Institute for Informatics. We work very hard to discover those data sources, to integrate them, to harmonize them and understand them.

Can you talk about some projects within the personalized medicine initiative?

We have a variety of projects underway in the Institute for Informatics that contribute to our precision medicine strategy, and they span a broad variety of diseases that most people are familiar with such as cancer, cardiovascular disease, neurodegenerative disease, and other common diseases. We also have projects focused on rare diseases that occur less frequently but are no less important when we think about how we can have better, more precise approaches to diagnosis and treatment planning.

One of our focuses is Alzheimer's disease. We know with Alzheimer's disease that there are a variety of presentations, and while the biology may be somewhat similar across those presentations, some patients will see very rapid neuro-degeneration and deterioration of their cognitive function, while for other patients it's a longer, more gradual process. And while we don't have curative strategies for any of those scenarios, there are measures we can take to improve quality of life and also to support caregivers and family members as they navigate this disease. But that means we need to understand what's the likely outcome for a patient.

We've been looking at a broad variety of data sources from patients enrolled in clinical trials in Washington University’s memory care clinic. This includes data that are captured in the EHR but also a variety of cognitive evaluation instruments and patient-generated data. And we've been using machine learning methods in order to identify patterns in that data that will allow us to predict which patients are going to have a rapid decline and which are more likely to have a slower, more longitudinal decline. We've seen great success with those preliminary models, and now we're working with our clinical collaborators to validate those. And importantly, we're doing that with data that's captured in the clinic so it doesn't require us to do anything different at the point of care. Rather it's a different way of looking at all that data that we collected at the point of care so we can improve our ability to make these prognostic assessments of a patient. We talk about this in the Institute for Informatics as being an effort to understand patient trajectory, so not all precision medicine is about finding a new treatment. Some aspects of precision medicine are simply about better understanding the trajectory a patient is on, so we can make smarter choices throughout the duration of that entire trajectory.

What are some of the challenges in terms of finding critical insights from EHRs and other data sources? What are some approaches you have taken to using EHR data?

One of the issues with using the EHR data in precision medicine or healthcare research is that much of the data that's really important to understand both health and disease is not captured in discrete or structured fields in the EHR. By that I mean very specific and concrete measurements that we can use when we apply advanced computational methods like machine learning.

Probably somewhere between 70 percent to 80 percent of the really high value data is either captured in narrative text in the form of notes in the EHR. It may be captured in documents, images or other communications that are scanned and attached to the record as PDFs or other non-computable formats. In all three of those scenarios, we have to use advanced computational methods to extract information from that narrative text from those scanned documents and images in order to render the discrete features that ultimately inform, for example, the predictive models in Alzheimer's disease or cardiovascular disease or diabetes or cancer. That's very challenging in terms of training the computational algorithms that allow us to extract that information, validating that the information that we're extracting is in fact accurate, and then integrating that with the other discrete data that we do get out of the EHR. That is exacerbated further when we start talking about patient-generated data or social determinants, which are also very important.

With the unstructured content that's found in notes in the EHR, one of the approaches we use is natural language processing or NLP, which is an AI approach to effectively interpret that narrative text and extract features. And it's not entirely different when we talk about imaging data. Humans looking at an image can point to a spot or a lesion in a chest CT, for example. But what we have to do is train a computer to recognize that same pattern and create a discrete field, which is, there is a lesion, where is it anatomically located? How large is it and how certain are we that it's there? That’s a very simple example, but it helps illustrate how we’re teaching computers to interpret pictures. A lot of what we do is teach the computer to read or teach the computer to interpret pictures so we can get those structured features back out and put them into our predictive models.

Are there elements of machine learning and AI behind the strategies to address cancer and other conditions?

There is a lot of interest in biomedicine around the use of AI, and in particular machine learning or deep learning to identify patterns in data and predict outcomes for patients. Broadly, there's great promise there in that these types of algorithms allow us to identify these high-order patterns in data that more traditional statistical modeling and testing approaches do not allow us to identify.

We know that in both health and disease, the patterns that exist around the interaction of genes, gene products, clinical features, people's behaviors, and their environments are very complex and that there's unlikely to be a single indicator that will tell us whether or not a patient is or is not going to experience a disease state, but rather it's a confluence of all these different indicators that help us to predict outcomes, and that's where the power of machine learning and other AI methods become all that much more important.

The real challenge is that while everyone is very excited about the promise of these AI-based methods, we still have to subject them to this same type of rigor in terms of their evaluation as we would any other discovery in biomedicine. The real challenge is that despite all the enthusiasm about machine learning and AI, we need to temper that with the need to do rigorous empirical research to understand whether these algorithms really produce the improvements in quality, safety outcomes and value of care that we anticipate. This is where we've seen some trouble early on. For example, there have been reports where people have said they have built an amazing algorithm, and it's going to diagnose everybody who might be at risk of lung cancer based on their chest CTs. Later, it’s found that that algorithm doesn't travel well and it's not replicable across multiple sites and populations, or it gives the wrong answer, and that's because it was treated perhaps more as an engineering exercise than a biomedical research exercise.

We have a number of projects right now where we built predictive algorithms to look at patient trajectory such as whether or not individuals are going to develop sepsis or other critical issues during an inpatient hospital stay. Then we do a series of prospective studies where we actually run these algorithms for real patients in the hospital, but we're not actually delivering those alerts to providers to make clinical decisions. Rather, we're trying to see whether the outcomes that the providers ascertained during standard of care activities for these patients match what our algorithms are predicting. If we see enough concordance between our algorithms and those experts, then we move to the next phase and we evaluate prospectively where we give those alerts to the providers and see if that changes outcomes. It's not a whole lot different from how we would proceed through the multiple stages of clinical trials for a new diagnostic or therapeutic approach. What we're doing is running clinical trials of AI, working through these increasingly sort of expansive stages of use and evaluation.

Are there also some issues around data bias?

If you don't use the right data to train your algorithm and understand how that data maps to the features of the patient populations that you are intending that algorithm to benefit, you can both encode biases and potentially predispose negative outcomes for the populations that you're going to deliver that solution to. So that means you have to start from the very beginning to measure and understand biases in the data that you're using to train the algorithm, and then similarly biases in the way in which you structure that study. Unfortunately, in the computational domain, the general thinking is just ‘get me more data,’ not necessarily thinking about how we reduce biases and increase the diversity of that data. That's a change in culture around these AI approaches that we have to promote. And it's certainly something that we focus really carefully on here with the work that we do in the Institute for Informatics.

Is a lot of the work based on breakthroughs in genomics? Or in clinical phenotypes?

Yes, genomics is a powerful way of understanding the basis for human health and disease, but I'm often inclined to say to my students when I sequence a patient and understand what information is encoded in their DNA, what I really learned is the blueprint for what is going to be built. So if you'll forgive the metaphor … we all know that when we build a building -- even when we have a blueprint that an architect has put together -- the actual building that we get is a function of the availability of materials, the quality of the labor that built that building, even maybe the weather while it was being built, the ground conditions, etc. There's a lot of factors that influence how we get from that blueprint to the final building. Well, the same is true for human beings. When I know the sequence of an individual's DNA, or potentially if I've looked at their RNA, I will know what is meant to happen biologically. But then all these other factors —clinical phenotype, behavior, environment — come into play. So breakthroughs in genomics are absolutely essential to delivering precision medicine, but we also have to measure all of these other data sources and combine that data if we're really going to understand the sort of complex, multifactorial space that contributes to both health and disease, and so in many ways we have been more successful in the genomics domain than we have been in our ability to phenotype patients clinically or understand how their environment or behaviors influence health and disease. We really have to play catch-up in order to better understand what's going on beyond the genome if we're really going to be able to achieve the promise of precision medicine.

Has your organization created models and used predictive analytics to better empower clinical operations, research and public health initiatives in response to the COVID-19 pandemic? Does it require new types of partnerships within your organization or within the community and public health organizations?

We have developed predictive models and deployed them for use both locally and at the population level to help us better respond to the pandemic. And these models have spanned a spectrum from identifying patients who are critically ill that might benefit from palliative care consults to better understanding the trajectory of our patients in the ICU and anticipating who might experience respiratory failure and therefore need early intervention in the form of more advanced respiratory therapies. And most recently we've been looking at how can we predict likely outcomes when a patient is placed on ECMO. Because we're now seeing younger, sicker patients with COVID, and one of the questions is should we put them on ECMO earlier? Because often ECMO is a therapy of last resort, which means that patients are already very ill when they're placed on ECMO, which reduces the therapeutic benefit to them. The question is, could we identify those patients earlier and perhaps intervene earlier to maximize outcomes and reduce the likelihood of complications?

In addition to that, we've done similar work at the population level, trying to anticipate hot spots of COVID infections based on prior activity in the region, such as testing or other patient reported data. Across the board, COVID-19 has both been a driver for us to think about how we can use prediction to better organize our response to the pandemic. It's also been a catalyst for moving some of these algorithms into the clinical environment or into the public health environment more quickly than we normally would have. This has both benefits and challenges — the benefit being we're getting real world experience; the challenge being we're not always getting the opportunity to evaluate them at the level of rigor that we might have if it was not a crisis situation. This is not to say that we're deploying unsafe algorithms; we're constantly monitoring these algorithms. It's actually a whole discipline of informatics that we refer to as algorithmovigilance, which is basically constant monitoring of the algorithm performance to ensure that it is doing what it is anticipated to do and that the results are accurate. But without a doubt, prediction has been a major part of how we responded to the pandemic.

You have been involved with the National COVID Cohort Collaborative (N3C), which aims to bring together EHR data, harmonize it and make it broadly accessible. What were some lessons your organization learned from the rapid-fire response to COVID?

Well, I think there's three critical lessons that we've learned from our rapid-fire response, especially at the national level. The first is that even with a disease like COVID, where we see high numbers of cases in almost every community, no single institution has enough data to be able to do the types of AI-driven studies just mentioned above. You really have to combine data across organizational boundaries if we're going to have enough robust, comprehensive data to train the types of algorithms that allow us to better respond to COVID-19 or any other emerging infectious disease in the future. So data sharing is central to this type of response.

What we learned is that nationally we really didn't have the infrastructure to do this. Despite the massive investments that have been made in electronic health records, the massive advancements in computation available at our fingertips, we simply don't have that infrastructure in healthcare. So, we've had to build that infrastructure in real time over the last 18 months in order to respond to COVID-19.

There are a number of ways in which we can use advanced computational methods to not only analyze this data, but also to ensure the privacy and confidentiality of the patients from whom the data has been generated. We have learned how to use a number of important technologies like synthetic data generation algorithms as well as more advanced data de-identification tools to ensure that we can do high-quality analysis and protect privacy and confidentiality. I think what it's shown us is that we can do both things, and so we need to maintain that same bar moving forward when we think about broader efforts to improve population health using large amounts of data.

What are you most excited about working on in the year ahead?

I think that the big opportunity in the year ahead is what I've often described to people is a renaissance in clinical decision support. For a long time, the history of informatics and data science in healthcare has been defined by the history of clinical decision support, i.e., using large amounts of data to better understand what a likely outcome for a patient in front of us today would be so we can make smarter decisions for them, and we do that every day.

We've always thought about it as a function of the data that we collect in the clinic or in the hospital, and what we've learned during the pandemic is that there are a lot of other really critical data sources. This includes biomolecular data, patient-generated data, environmental data, social determinants of health, and all the measures that go along with that. And we've not used that data traditionally to inform clinical decision support, but we've learned during the pandemic that when we put those pieces together, we get clinical decision support that's vastly better than the clinical decision support that we've had in the past. The question is, do we take those lessons learned — and I believe we will — and build more comprehensive clinical decision support that meets the needs of not only providers but actual patients, who are being engaged as an integral part of the decision-making process. In a lot of ways, precision medicine doesn't always have to be about sequencing patients. Sometimes precision medicine is just about making sure patients can get to the right provider at the right time and place, and we don't need a genome to do that. We don't need other complex data sources. We just need to understand a patient’s needs and map it to available healthcare resources and really connect the dots.