Stanford Expert: Congress Should Require Health System AI Review Process

Testifying before a U.S. Senate Committee on Feb. 8, a Stanford University health policy professor recommended that Congress should require that healthcare organizations “have robust processes for determining whether planned uses of AI tools meet certain standards, including undergoing ethical review.”

Michelle M. Mello, J.D., Ph.D., also recommended that Congress fund a network of AI assurance labs “to develop consensus-based standards and ensure that lower-resourced healthcare organizations have access to necessary expertise and infrastructure to evaluate AI tools.”

Mello, a professor of health policy in the Department of Health Policy at the Stanford University School of Medicine and a professor of Law, Stanford Law School, is also affiliate faculty to the Stanford Institute for Human-Centered Artificial Intelligence. She is part of a group of ethicists, data scientists, and physicians at Stanford University that is involved in governing how healthcare AI tools are used in patient care.

In her written testimony before the U.S. Senate Committee on Finance, Mello noted that while hospitals are starting to recognize the need to vet AI tools before use, most healthcare organizations don’t have robust review processes yet, and she wrote that there is much that Congress could do to help.

She added that in order to be effective, governance can’t focus only on the algorithm but must also encompass how the algorithm is integrated into clinical workflow. “A key area of inquiry is the expectations placed on physicians and nurses to evaluate whether AI output is accurate for a given patient, given the information readily at hand and the time they will realistically have. For example, large-language models like ChatGPT are employed to compose summaries of clinic visits and doctors’ and nurses’ notes, and to draft replies to patients’ emails. Developers trust that doctors and nurses will carefully edit those drafts before they’re submitted—but will they? Research on human-computer interactions shows that humans are prone to automation bias: we tend to over-rely on computerized decision support tools and fail to catch errors and intervene where we should.”

Therefore, regulation and governance should address not only the algorithm, but also how the adopting organization will use and monitor it, she stressed.

Mello said she believes that the federal government should establish standards for organizational readiness and responsibility to use healthcare AI tools, as well as for the tools themselves. But with how rapidly the technology is changing, “regulation needs to be adaptable or else it will risk irrelevance—or worse, chilling innovation without producing any countervailing benefits. The wisest course now is for the federal government to foster a consensus-building process that brings experts together to create national consensus standards and processes for evaluating proposed uses of AI tools.”

Mello suggested that through its operation of and certification processes for Medicare, Medicaid, the Veterans Affairs Health System, and other health programs, Congress and federal agencies can require that participating hospitals and clinics have a process for vetting any AI tool that affects patient care before deployment and a plan for monitoring it afterwards.

As an analogue, she said, the Centers for Medicare and Medicaid Services uses The Joint Commission, an independent, nonprofit organization, to inspect healthcare facilities for purposes of certifying their compliance with the Medicare Conditions of Participation. “The Joint Commission recently developed a voluntary certification standard for the Responsible Use of Health Data which focuses on how patient data will be used to develop algorithms and pursue other projects. A similar certification could be developed for facilities’ use of AI tools.”

The initiative underway to create a network of “AI assurance labs,”and consensus-building collaboratives like the 1,400-member Coalition for Health AI, can be pivotal supports for these facilities, Mello said. Such initiatives can develop consensus standards, provide technical resources, and perform certain evaluations of AI models, like bias assessments, for organizations that don’t have the resources to do it themselves. Adequate funding will be crucial to their success, she added.

Mello described the review process at Stanford: “For each AI tool proposed for deployment in Stanford hospitals, data scientists evaluate the model for bias and clinical utility. Ethicists interview patients, clinical care providers, and AI tool developers to learn what matters to them and what they’re worried about. We find that with just a small investment of effort, we can spot potential risks, mismatched expectations, and questionable assumptions that we and the AI designers hadn’t thought about. In some cases, our recommendations may halt deployment; in others, they strengthen planning for deployment. We designed this process to be scalable and exportable to other organizations.”

Mello reminded the senators not to forget health insurers. Just as with healthcare organizations, real patient harm can result when insurers use algorithms to make coverage decisions. “For instance, members of Congress have expressed concern about Medicare Advantage plans’ use of an algorithm marketed by NaviHealth in prior-authorization decisions for post-hospital care for older adults. In theory, human reviewers were making the final calls while merely factoring in the algorithm output; in reality, they had little discretion to overrule the algorithm. This is another illustration of why humans’ responses to model output—their incentives and constraints—merit oversight,” she said.