Drexel University Moves Forward on Leveraging NLP to Improve Clinical and Research Processes

Increasingly, the leaders of patient care organizations are using natural language processing (NLP) technologies to leverage unstructured data, in order to improve patient outcomes and reduce costs. Healthcare IT and clinician leaders are still relatively early in the long journey towards full and robust success in this area; but they are moving forward in healthcare organizations nationwide.

One area in which learnings are accelerating is in medical research—both basic and applied. Numerous medical colleges are moving forward in this area, with strong results. Drexel University in Philadelphia is among that group. There, Walter Niemczura, director of application development, has been helping to lead an initiative that is supporting research and patient care efforts, at the Drexel University College of Medicine, one of the nation’s oldest medical colleges (it was founded in 1848), and across the university. Niemczura and his colleagues have been partnering with the Cambridge, England-based Linguamatics, in order to engage in text mining that can support improved research and patient care delivery.

Recently, Niemczura spoke with Healthcare Informatics Editor-in-Chief Mark Hagland, regarding his team’s current efforts and activities in that area. Below are excerpts from that interview.

Is your initiative moving forward primarily on the clinical side or the research side, at your organization?

We’re making advances that are being utilized across the organization. The College of Medicine used to be a wholly owned subsidiary of Drexel University. About four years ago, we merged with the university, and two years ago we lost our CIO to the College of Medicine. And now the IT group reports to the CIO of the whole university. I had started here 12 years ago, in the College of Medicine.

And some of the applications of this technology are clinical and some are non-clinical, correct?

Yes, that’s correct. Our data repository is used for clinical and non-clinical research. Clinical: College of Medicine, College of Nursing, School of Public Health. And we’re working with the School of Biomedical Engineering. And college of Arts and Sciences, mostly with the Psychology Department. But we’re using Linguamatics only on the clinical side, with our ambulatory care practices.

Overall, what are you doing?

If you look at our EHR [electronic health record], there are discrete fields that might have diagnosis codes, procedure codes and the like. Let’s break apart from of that. Let’s say our HIV Clinic—they might put down HIV as a diagnosis, but in the notes, might mention hepatitis B, but they’re not putting that down as a co-diagnosis; it’s up to the provider how they document. So here’s a good example: HIV and hepatitis C have frequent comorbidity. So our organization asked a group of residents to go in and look at 5,700 patient charts, with patients with HIV and hepatitis C. Anybody in IT could say, we have 677 patients with both. But doctors know there’s more to the story. So it turns out another 443 had HIV in the code and hep C mentioned in the notes. Another 14 had hep C in the code, and HIV in the notes.

So using Linguamatics, it’s not 5,700 charts that you need to look at, but 1,150. By using Linguamatics, we narrowed it down to 1,150 patients—those who had both codes. But then we found roughly 460 who had the comorbidity mentioned partly in the notes. Before Linguamatics, all residents had to look at all 5,700 charts, in cases like this one.

So this was a huge time-saver?

Yes, it absolutely was a huge time-saver. When you’re looking at hundreds of thousands or millions of patient records, the value might be not the ones you have to look at, but the ones you don’t have to look at. And we’re looking at operationalizing this into day-to-day operations. While we’re billing, we can pull files from that day and say, here’s a common co-morbidity—HIV and hep C, with hep C mentioned in those notes—and is there a missed opportunity to get the discrete fields correct?

Essentially, then, you’re making things far more accurate in a far more efficient way?

Yes, this involves looking at patient trials on the research side, while on the clinical side, we can have better quality of care, and more updated billing, based on more accurate data management.

When did this initiative begin?

Well, we’ve been working with Linguamatics for six or seven years. Initially, our work was around discrete fields. The other type of note we look at has to do with text. We had our rheumatology department, and they wanted to find out which patients had had particular tests done—they’re looking for terms in notes… When a radiologist does a report on your x-ray, it’s not like a test for diabetes, where a blood sugar number comes out; x-rays are read and interpreted. The radiologists gave us key words to search for, sclerosis, erosions, bone edema. There are about 30 words. They’re looking for patients who have particular x-rays or MRIs done, so that instead of looking for everyone who had these x-rays done, roughly 400 had these terms. We reduced the number who were undergoing particular tests. The rheumatology department was looking for patients for patient recruitment who had x-rays done, and had these kinds of findings.

So the rheumatology people needed to identify certain types of patients, and you needed to help them do that?

Yes, that’s correct. Now, you might say, we could do word search in Microsoft Word; but the word “erosion” by itself might not help. You have to structure your query to be more accurate, and exclude certain appearances of words. And Linguamatics is very good at that. I use their ontology, and it helps us understand the appearance of words within structure. I used to be in telecommunications. When all the voice-over IP came along, there was confusion. You hear “buy this stock,” when the message was, “don’t buy this stock.”

So this makes identifying certain elements in text far more efficient, then, correct?

Yes—the big buzzword is unstructured data.

Have there been any particular challenges in doing this work?

One is that this involves an iterative process. For someone in IT, we’re used to writing queries and getting them right the first time. This is a different mindset. You start out with one query and want to get results back. You find ways to mature your query; at each pass, you get better and better at it; it’s an iterative process.

What have your biggest learnings been in all this, so far?

There’s so much promise—there’s a lot of data in the notes. And I use it now for all my preparatory research. And Drexel is part of a consortium here called Partnership In Educational Research—PIER.

What would you say to CIOs, CMIOs, CTOs, and other healthcare IT leaders, about this work?

My recommendation would be to dedicate resources to this effort. We use this not only for queries, but to interface with other systems. And we’re writing applications around this. You can get a data set out and start putting it into your work process. It shouldn’t be considered an ad hoc effort by some of your current people.