Data Science

Interpretability and the promise of healthcare AI

January 23, 2020
AI interpretability is especially critical in healthcare.

Emergency departments (EDs) are an expensive and time-consuming option for getting care, especially when you don't have an emergency. What if we could suggest other treatment options to potential patients with non-urgent conditions beforehand instead of their choosing the ED? They could probably get less expensive care that is better tailored to their needs and with a shorter wait.

This might be a good idea to improve patient care, but first we have to identify who is at high risk of using the ED for non-urgent conditions. Let's say we have a list of potential patients. We provide that list to a predictive model -- perhaps one created by Geneia data scientists -- and it tells us which potential patients in the list are at high risk. Do we believe it? Why should we trust a computer?

Models created with artificial intelligence (AI) have grown amazingly powerful in recent years, but they're still far from perfect. In deciding whether we should trust our ED risk model, it would be really helpful if the model could tell us why it made the predictions it did.

Model Interpretability

In other words, we need the model to be interpretable. Interpretability is an important issue throughout the world of AI, but it's especially critical in healthcare where our health and even our lives may be at stake.

Interpretability is also a challenging issue, because powerful models have become so complex that it's hard for us to understand and articulate their inner workings. Ideally, we could have a conversation with our model, just like we would with another person, where it would explain its reasoning and answer our follow-up questions. It would use easy-to-understand language or pictures. It would focus on the most important reasons for its judgment -- not merely give us 1,000 numbers to sort through. But if we asked it for details, it would provide those, too. It would be flexible and socially aware:  a patient, a nurse, a primary care physician and a surgeon might expect different kinds of explanations from it. And it would be consistent; it would provide similar explanations for similar patients with similar predictions.

Obviously, our current technology is nowhere close to this ideal vision. But many smart data scientists are working hard to develop new methods for interpreting these complex models. A recent innovation is the adaptation of Shapley values to AI models.

Shapley Values

What do Shapley values do? Well, imagine you're playing a complicated game with some friends. You all have to cooperate to win, and when you do, you win some prize money. How do you split the prize among the players? Splitting it equally isn't necessarily fair, because some friends might have contributed more to the effort than others. Shapley values provide a mathematical way of fairly calculating each player's contribution.

Now, instead of friends cooperating in a game, imagine features of a model working together to make a prediction. What is a feature? For our ED risk model, a feature might be a potential patient's age, how far away she lives from the hospital or how many chronic medical conditions she has. So, a Shapley value might tell us that our potential patient's high number of prior ED visits might increase her future risk of going to the ED by a lot, or that her young age decreases her risk moderately.

Let's look at an example.

Suppose that the average probability of a future ED visit among all the potential patients on our list is 10 percent. Let's also suppose that we’re part of a clinical team that is using the model’s results to reduce unnecessary ED visits.

The model predicts that "Tiffany" has an unusually high risk; hers is 70 percent. We're concerned and wonder why Tiffany has such a high risk and whether the model's prediction is reasonable. Here's a graph that I created to show how Shapley values might explain why Tiffany has such a high risk. To be clear, Tiffany is not a real person and this isn't real data; I just made them up to demonstrate how Shapley values work.

Probability of emergency department visits

From the graph, we can see Tiffany has three prior ED visits, and according to our model, this increases her risk from the overall average of 10 to 60 percent. Her four chronic medical conditions push her risk even further up to 95 percent. However, Tiffany is relatively young at only 29 years, so that pulls her risk down to 75 percent, and she lives five miles from a hospital, so that pulls her risk down a little bit more to 70 percent. This all looks quite reasonable, so we figure the model is probably working well in Tiffany's case.

Using our clinical judgment, we focus on how Tiffany's four chronic medical conditions are increasing her risk, according to the model. Perhaps one of these conditions, or maybe a combination of them, is responsible for Tiffany's prior ED visits and for her high risk of a future ED visit. Perhaps if we look closer at Tiffany's medical history, we can figure out exactly how one or several of these conditions are increasing Tiffany's risk.

This reasoning is an insightful use of our clinical judgment and demonstrates how an interpretable model can aid our decision-making. However, we have to be cautious about over-interpreting our model, because models always have limitations.

Correlation and Causation

We've probably all heard that "correlation does not imply causation." How is this old, wise adage relevant to our model? We learn about the world by observing it and by doing things to it. If we observe two things consistently happening together, we can conclude these two things are correlated. If we do something repeatedly and we always get the same result, then we can conclude that our action is a cause of that result.

All of our complex AI models are based only on observations*, so they can tell us only about correlations. So, just because a correlation -- like Tiffany's four chronic medical conditions correlating with risk for future ED visits -- shows up in our model, we can't necessarily conclude we've detected a cause. Sometimes correlations are also causes, and sometimes they aren't. We have to use our clinical judgment to decide whether that correlation is likely to be a cause or not.

This example of model interpretability beautifully encapsulates the promise of welcoming AI into healthcare. Our emerging new technological age lets us combine computers' ability to sift through mountains of data with human wisdom about how to use that information. We data scientists here at Geneia are excited to help usher in this new age of improving patients' health and lives.

*Okay, yes, we can have models based on causation, but because health and medicine are so complex, we usually don't have good causative models to use in practice.

For those interested in more technical information about model interpretability and Shapley values, I heartily recommend Tim Miller's paper "Explanation in Artificial Intelligence: Insights from the Social Sciences" and Christoph Molnar's book "Interpretable Machine Learning:  A Guide for Making Black Box Models Explainable".