At HIMSS22, AI Leaders Look at the Challenges of Unstructured Data

On the first day of HIMSS22, being held at the Orange County Convention Center in Orlando, the Artificial Intelligence and Machine Learning Symposium featured several important panels and individual presentations.

The first panel of the day, “Unstructured and Conversational Data’s Role in Machine Learning and AI,” offered attendees a stark look into both the opportunities for artificial intelligence (AI) and machine learning (ML), and the many challenges, particularly around unstructured data. The landscape around both unstructured data, and the subset of what’s called conversational data, remains complicated, with a lot of pitfalls and broad issues to be resolved. The discussion was introduced by Julius Bogdan, vice president, analytics, at HIMSS, the Chicago-based Healthcare Information & Management Systems Society, the conference’s sponsor, and moderated by Selena Davis, Ph.D., a digital health strategist and research scientist for HIMSS. Davis was joined by Brad Ryan, chief product officer at the Washington, D.C.-based NCQA (National Committee for Quality Assurance); Kerri Webster, R.N., M.S., vice president and chief analytics officer at Children’s Hospital Colorado, based in the Denver suburb of Aurora; Li Qin, Ph.D., principal data scientist at CVS Health; and Sally Embrey, vice president of public health and health technologies at the Boston-based DataRobot.

from left: panelists Embrey, Webster, Qin, Ryan, Davis

After self-introductions, the discussion immediately turned to the question of wearable devices, as Davis asked the other panelists, “How will the unstructured information from wearables be used for machine learning and AI, to create a more comprehensive patient record?”

“I don’t know that there’s an easy answer” to that question, Children’s Colorado’s Webster replied. “Unstructured data is unstructured. There’s some tension regarding patient control. But I have my Fitbit on now,” she said, gesturing to the wristwatch on her wrist. “And I am a human and a whole creature. I think it’s important,” she said, referring to the accelerating efforts to somehow bring data from consumer-facing wearables into the healthcare data ecosystem, in order to analyze that data to improve patient outcomes. But, she immediately warned, “The devil’s in the details” when it comes to making the data usable. “And obviously, I advocate for children, who have no say in how their data’s used until they’re 18. So it’s a journey.”

“Wearable data is very important,” said CVS/Aetna’s Qin. “For our insured members, we mostly have their claims data, which is low-frequency; but having high-frequency, continuous, near-real-time wearable data, we can better understand the health status of the user, and we can try to match the users with healthy recommendations. And if we can connect it with the low-frequency data, like the claims data, as well as hospital and physician visit data, that helps a lot.”

About 80 percent of health data is unstructured,” DataRobot’s Embrey noted. “It will be a hurdle when patients begin to insert their data, both wearable and other. We’ll need new tools and technology to uncover the trends and pull value from it.”

And, said the NCQA’s Ryan, “Wearables are one example of where we’re generating more data—mental and behavioral health data, telehealth, digital therapeutics—there’s a whole new framework for this, and I agree, it’s likely to be unstructured. We really goofed as an industry in building EHRs [electronic health records] in not creating the ability to capture unstructured data. We have an opportunity to leverage NLP [natural language processing] on the fly as we’re starting to collect this new data, to get to some new standards. So for me, one of the most interesting uses of ML and AI is in moving data from unstructured to structured.”

The role of emerging technology

Of course, new technologies are continuously emerging now, HIMSS’s Davis said. “Let’s talk about the technology piece.” “There is new technology emerging all the time,” Embrey said. “The number of companies creating data warehouses and data lakes is growing all the time. Once you have all this data, it will become impossible for a single person to run analytics at scale without AI and machine learning. There are new things coming out all the time. Also, if we put up barriers where only traditional scientists can use the data, it will create barriers.”

“In terms of my personal data coming from my bed tells me how well I’ve slept—it tells me what to do, asks me how much water I’ve had, etc. That technology is there; we need to build on that and expand it, to support the whole person,” Webster noted.

“There’s a saying out there that technology moves at the speed of trust,” Ryan said. “And there has to be an underlying trust in the data that AI and ML are training on. So we need to look at some of the federated data models, so we’re not asking people to share data where they don’t know where it’s going, so that they can trust the data.”

“I totally agree that trust and transparency are fundamental issues,” Qin said. “And, as a data scientist, I realize that data quality, speed, and integration from millions of data files—how to put them all together in a scalable and efficient way for downstream analytics and model-building, are important. How to build the tools to monitor and track the data quality and monitor and track our model performance, those are all important issues that can help us make the black box less black.”

“If you’re a data scientist, and you don’t have clinician partners who can help explain your models all along the way, there won’t be trust,” Embrey said. “I think we should develop models involving no black boxes.”

What’s more, the issues are far from abstract. “At NCQA, we validate data used for quality reporting, so there’s a high-stakes business case for making sure the data is both accurate and comparable across organization,” Ryan said. “And what we do in terms of auditing the data trail, and where the world is going—it’s hard to know how to validate a model or algorithm if a model can change or evolve over time, and knowing that even well-intentioned models can develop bias over time. So we’re facing that issue right now.”

What role for retail data?

“Let’s talk about retail data,” Davis said. “Do we bring it in? Where do we begin with that piece of this?”

“I think we’re asking whether the patient is a person,” Ryan said, “and the answer is yes, patients are people. And we’re expanding the scope of healthcare at a regular clip, to look at the social determinants of health, social aspects, mental and behavioral health. So I believe the lines are already blurred in terms of what is retail data and what is not. How we somehow address HIPAA-type concerns, is something we have to address over the next few years.”

“Retail data is health data, and you can learn things about a person based on their retail purchases,” Webster noted. “But if you used your power to deny people care or raise their insurance rates, or sell them something—well, we have to be careful.”

“I think that retail data will provide for more-representative data sets,” Embrey offered. And there’s a real potential for preventive care. And being able to up to the pharmacy in the retail setting and ask for help, we could help people.”

What’s more, Qin said, “Retail data is another space where we can learn about people. A person can go to a retail store, and use the pharmacy and use the minute clinic—and from what they purchase, and from their prescription medicines, and how they use other kinds of services like insurance, and how they exercise as trackers track their health and fitness behaviors, we can learn a lot more about everybody, and can use all that data to improve quality of life, make consumers healthier, engage in early prevention and disease management. And for example, a side effect from a vaccine or from underlying diseases, and what they purchase in the retail store, ideally, we want to extract early signs and recommend they go to the minute clinic or for a checkup, so we can help them early on in the journey, to prevent more serious diseases from happening and save costs for members. And of course, patients need to be given the right to secure the privacy of their data. There’s still a long way to go, but we will eventually open up a much better space to improve quality of life.”

Access to the data

When it comes to access to data, “There are a number of opportunities, along with the cautions,” Davis said. “So how would access to this data, and ML and AI, be used for optimizing outcomes, sustainability of our system, via the Quintuple Aim? How can the data be leveraged?”

Webster said that “I think it will require leveraging the data with the patient and the provider. What about making the patient their own data scientist? That would help us with trust. If we can make data scientists of all of us, we can advance.”

“Should we be providing a dashboard to the patient?” Davis asked. “Yes, give the patient a dashboard,” Webster responded. “And give them a hint, hey, why don’t you try this? Or encourage them to get vaccinated,” for example, using patient-facing dashboards, she said.

Another example, said Embrey, is that “We collaborate with the West Virginia Department of Health. We’re collaborating with them on county-level analyses. We can look to see where people most seek services, and can increase the convenience factor for them.”

“I came to healthcare with a background in engineering, math, and computer science,” Ryan said. “And my girlfriend is a surgeon, and even today, 95 percent of the studies are based on randomized control studies. But we’ve woefully underused capabilities in the learning health system to see what’s going on out there. I think it’s actually a cultural change in healthcare that’s the biggest blocker to this. This idea of first, do no harm in healthcare is actually a powerful statement. But there are opportunities to match our risk tolerance to the situation that we don’t always use. We’ve got this gold standard of the randomized trial and a risk aversion to looking at data for trends and resistance to changing clinical practice. I think that some AI-based data analysis needs to be embraced as a source in healthcare.”

In fact, said Qin, “We need to design our thinking to put the consumer up front. For example, I work mostly on consumer as member side of machine learning. We want to help not only the physicians to get a dashboard, but also for the patients to get a dashboard. But we need to also do proactive and preventive things. For example, we should be able to recommend a high-performing PCP to a member, and help the member and provider establish a value-based care relationship. Doing the interaction early on will in the long run help to advance care management.”

A question from an audience member was this: “How do we get providers to accept the insertion of patient-derived data into clinician practice?”

“That’s the million-dollar question,” Webster said. “I think the issue is getting the provider to trust the data.”

“More data is probably not the answer,” Ryan said. “The question is, how do you make the data relevant to the decision-making process with the provider, and hopefully both the provider and patient, at the right time? Implementation science will be important—surfacing relevant things at the right time is under-appreciated.”