Sparking Debate, Researchers Question Validity of Epic’s Sepsis Prediction Model

In a new paper published in JAMA Internal Medicine, researchers reveal that a sepsis prediction model developed by Epic Systems performs much worse than what the vendor tells its customers.

The model correctly sorts patients on their risk of sepsis just 63 percent of the time, compared to Epic’s information sheet, which touts that its sepsis warning system can correctly distinguish two patients with and without sepsis at least 76 percent of the time, according to the researchers.

For context, one in three patients who dies in a hospital has sepsis, a severe inflammatory response to an infection, marked by organ dysfunction, according to the Centers for Disease Control and Prevention (CDC). As such, in recent years, clinical and informatics leaders have made predicting which patients are at risk for developing the destructive condition a top priority.

Making matters even more challenging is the fact that sepsis is very hard to distinguish in its early stages. As such, the researchers noted, many models have been developed to improve timely identification of sepsis, but their lack of adoption has led to an implementation gap in early warning systems for sepsis. This gap has largely been filled by electronic health record (EHR) vendors, who have integrated early warning systems into the EHR where they can be readily accessed by clinicians and linked to clinical interventions. Indeed, more than half of surveyed U.S. health systems report using electronic alerts, with nearly all using an alert system for sepsis, according to the researchers.

For this study, researchers examined a cohort study of nearly 28,000 patients undergoing over 38,000 hospitalizations from 2018 to 2019 at Michigan Medicine, the academic health system of the University of Michigan.

The Epic Sepsis Model (ESM) is a proprietary sepsis prediction model, developed and validated by Epic Systems, and is based on data from more than 400,000 patient encounters across three health systems from 2013 to 2015. It is implemented at hundreds of U.S. hospitals, and the company’s EHR system is used by 56 percent of hospitals and health systems in the country. The ESM’s ability to identify patients with sepsis has not been adequately evaluated despite widespread use, the researchers noted.

According to the research, the ESM identified 183 of 2,552 patients with sepsis (7 percent) who did not receive timely administration of antibiotics prior to or within three hours after sepsis, “highlighting the low sensitivity of the ESM in comparison with contemporary clinical practice.”

Meanwhile, the ESM also did not identify 1,709 patients with sepsis (67 percent), despite generating alerts for an ESM score of 6 or higher in 18 percent of all hospitalized patients, “thus creating a large burden of alert fatigue.” Sixty percent of these 1,709 patients still received timely antibiotics.

If the ESM were to generate an alert only once per patient when the score threshold first exceeded 6—a strategy to minimize alerts—then clinicians would still need to evaluate 15 patients to identify a single patient with eventual sepsis. If clinicians were willing to reevaluate patients each time the ESM score exceeded 6 to find patients developing sepsis in the next four hours, they would need to evaluate 109 patients to find a single patient with sepsis, the researchers noted.

An Epic spokesperson disputed the study’s conclusions in a Wired report, saying the company’s system has “helped clinicians save thousands of lives,” also noting that a separate study done at Prisma Health in South Carolina on a smaller sample of 11,500 patients found that Epic’s system was associated with a 4 percent reduction in mortality of sepsis patients. Epic also stated that the Michigan study set a low threshold for sepsis alerts, which would be expected to produce a higher number of false positives, according to the piece.

Lead researcher speaks out

One of the study’s leaders, Karandeep Singh, M.D., an assistant professor at University of Michigan, has done a few interviews this week following the public release of his team’s study’s findings, which has caught mainstream media attention.

Singh explained to Michigan Medicine Laboratories that Epic’s model integrates data from all cases billed as sepsis, which is problematic because “people bill differently across services and hospitals, and it’s been well recognized that trying to figure out who has sepsis based on billing codes alone is probably not accurate.” What’s more, in the model’s development, the onset of sepsis was defined as the time the clinician intervened—for example, by ordering antibiotics or lab work. In other words, the model was developed to predict sepsis that was recognized by clinicians at the time it was recognized by clinicians. The problem with that is “we know that clinicians miss sepsis,” Singh said.

The aforementioned piece in Wired noted that Singh was asked to chair a committee at the university’s health system created to oversee uses of machine learning, and from there, became curious about health IT vendors that disclosed little about how their AI tools worked or performed. “His own system had a license to use Epic’s sepsis prediction model, which the company told customers was highly accurate. But there had been no independent validation of its performance,” according to the Wired article.

In a long Twitter thread this week, Singh wrote that folks should not be “dunking” on Epic, instead reiterating that Epic's choice of billing codes to define sepsis are not the norm, leading to this discrepancy.

In one key tweet as part of the thread, he wrote:

Epic has done a lot of *good things* in this space by enabling model developers to run Python models in the EHR. But their position as a walled-garden app store in addition to a first-party model developer means that their modeling work should be subject to more scrutiny.
— Karandeep Singh (@kdpsinghlab) June 22, 2021

In the end, the study’s authors came to the conclusion, “This external validation cohort study suggests that the ESM has poor discrimination and calibration in predicting the onset of sepsis. The widespread adoption of the ESM despite its poor performance raises fundamental concerns about sepsis management on a national level.”

They added, “Owing to the ease of integration within the EHR and loose federal regulations, hundreds of U.S. hospitals have begun using these algorithms. Medical professional organizations constructing national guidelines should be cognizant of the broad use of these algorithms and make formal recommendations about their use.”