CHAI Envisions Federated Network of 30 Assurance Labs
The Coalition for Health AI (CHAI) expects a federated network of approximately 30 assurance labs to be stood up this year, said Brian S. Anderson, M.D., who was recently named CHAI’s first CEO.
The nonprofit CHAI includes representatives from over 1,500 member organizations including hospital systems, tech companies, government agencies and advocacy groups. It aspires to contribute to best practices with the testing, deployment, and evaluation of AI systems. This work will engage many stakeholders, promoting discovery and experimentation, and sharing AI innovations in healthcare, including methods that leverage traditional machine learning and more recent developments in generative AI.
Speaking to the NIH Collaboratory Grand Rounds on March 8, Anderson started out by noting that artificial intelligence has a trustworthiness problem.
“The vast majority of Americans do not trust AI. It varies between 60 and 70 percent, and the numbers only go up from there when you add health as part of that,” he said. “We want to add transparency about how these models are performing and where they're actually deployed. All of the models that we'll be testing will be going through a federated network of assurance labs. We’ll be publishing report cards in a registry for everyone to see — public laypeople as well as scientists, software developers and the like. It will enable people to understand how models are performing in small sub-cohort populations, underserved populations.”
Different SDOH characteristics will be part of the metrics used in the evaluation of these models. CHAI also is with Peter Embi, M.D., M.S., at Vanderbilt University Medical Center on a maturity model. “We're excited to see that work come to fruition. We'll be publishing some of that work within the next six months or so,” Anderson said.
CHAI is working to define what is good or responsible AI and to measure it and agree on what the assurance methodology is for predictive and generative AI and then develop a set of sector-specific use cases — payers, clinical decision support, administrative or back-end management, and life sciences, explained Anderson, who was previously chief digital health physician at MITRE.
He explained the concept of assurance labs. “When you think about tools, like electrical devices in your house as an example, they might have an Underwriters Lab sticker that says that it meets a certain quality standard. Or the National Highway Safety Institute or the Insurance Institute for car manufacturers — they test these independently and then they have a methodology for evaluation. They issue report cards that are oftentimes published in Consumer Reports. We envision a Consumer Reports-like effort with a federated network of assurance labs across the U.S.”
‘The hope is that in the shared discovery process that gets us to that testing and evaluation framework, we will have a rubric that these labs can adopt to say, ‘Okay, any model that wants to come through for training purposes, or for testing and validation purposes, will be evaluated according to this framework.’”
Anderson said that when we think about building AI to serve all of us across the U.S. — from, inner city Chicago to rural Kansas to the Navajo Nation in the Southwest and rural Mississippi families in the Southeast — we need to have diverse sets of data. “The mission for CHAI in this space will be to support health systems, large and small, well-resourced, not well-resourced, to be able to have the tooling in place to do both external and local validation. We need a mix of both. It is my strong belief that this is the path to building AI to serve all of us.”
All of this is going to require experimentation, Anderson noted. “We don't have a set of agreed-upon metrics on generative AI. We need to experiment. We need to identify what works, what doesn't work, what’s scalable,” he said. Doing that level of experimentation in a digital sandbox is going to be really important. Having that kind of sandbox for these labs to be able to do their tool assessment for performance is also going to be critical.”
Asked for his definition of success within the next year, Anderson said CHAI wants to have an initial version of a core set of technical standards and best practices published and consented and adopted by industry and an evaluation framework that's published and adopted and used by a diverse set of assurance labs that are stood up and in thriving, helping to support rapid development of models and independent testing and validation of models. “That's what success looks like to me.”