A Dash of LIME
Nikitha Srikanth
Posted on September 14, 2020
If you were put inside a completely black box, what would it feel like? Well, for one thing, it would be completely dark. You would have no way of looking outside the box, and most importantly, no one would be able to see you from outside.
Now assume there is one such random box. Let's say this box can predict the future. So based on that, it gives you suggestions like whether you should buy that car you've always wanted, or whether learning Spanish will help you in the future. You have no idea what information the box uses to come up with the prediction. So should you trust it? If you believe in tarot cards and magic 8-balls, this isn't probably that different. But you would be better off knowing why the model came up with these suggestions in the first place since you're trusting it with making decisions about your life.
Magic 8-ball: Always ask again to be sure
Let's look at another scenario where 100 people use one random black box each to navigate in the dark. 99 of these people safely reach their destination with the black box shouting out directions in which they should move. But one person falls off the edge of a cliff. Is this a trustworthy box? It did do its job perfectly 99% of the time. But ideally, you would like to know why the box made a mistake and fix it. But you can't see what is inside a black box remember? Well, the next obvious question is, why would we use a shady black box then?
In scientific terms, a black box is anything that takes in some input and gives out a result, with no explanation whatsoever as to how it arrived at it.
This is also how most machine learning models work, especially neural networks, and we are using these shady black boxes everywhere. You give these models lots of data using which models learn some underlying patterns and give you some output. With deep neural networks, there are way too many neurons and layers to fully understand what is happening inside. Maybe you get extremely high accuracies in a classification task. But you never know what it has learnt to perform this classification. Good accuracy can come out of learning the wrong patterns and from overfitting as well.
A great example is a model that identified whether an image had a dog or a wolf. It would get it right most of the time, giving a false sense of accuracy that it had learnt the right things. But with tools that help understand what the black box model does, it was seen that the model was classifying based on the background and not the animal. The model noticed that most images of wolves had snow in the background. So what it was actually classifying was “snow" and "no snow”.
Left: Dogs, Right: Wolves (There is snow in the background for wolves and no snow for dogs, which is what the model caught on)
It is better to have a model that performs badly due to genuine confusion. This could look like a model wrongly classifying a human in a dog costume as a dog (This is understandable because some humans can pull off some great impersonations).
Left: Person dressed as a dog, Right: Dog (Can be confusing sometimes)
The point here is that the model is classifying wrongly, but for the right reasons. It is better to have a model you can trust even if it gets things wrong sometimes, because you know what it is learning, and you can make changes accordingly to learn better.
It is important now more than ever for humans to be able to interpret these models. We need to move towards making these boxes transparent because
- AI is being used to make important decisions like diagnosing deadly diseases, pick who gets hired, who gets a loan, who gets parole, where you should invest. We can't afford to learn the wrong things and must be able to trust the model.
- Wrong decisions could perpetuate and reinforce stereotypes because the model may learn patterns that exist due to biases in society.
So blindly trusting a model is a bad idea. Now, what can we do if we can't look in the box?
In fact, you can do a bunch of things. I want to talk about one thing in particular (That was a lot of unnecessary examples to get to the point).
This is where we add a dash of LIME to our lives. In the paper “Why Should I Trust You? Explaining the Predictions of Any Classifier", the authors propose a tool called LIME - Local Interpretable Model-Agnostic Explanations. This helps us explain the actions of a model to a certain extent by building a model over a model.
LIME exploits the fact that it is easier to explain a simple surrogate model than to explain the actual complex model.
With a black box model at hand, if you want to explain the model's predictions for a particular observation, we make some changes to this observation. In an image, this would look like greying out some parts.
Left: Original Image, Right: Image with different regions that can be greyed out. Sources: Marco Tulio Ribeiro
So a new set of samples similar to the observation are generated with some parts removed in each. The model's predictions are then seen for this generated set of samples. Using these predictions, the different portions of the image are weighted. So if an image of a tree frog is classified correctly when the portion of its head is present but wrongly when most of the head is greyed out, it means that the head portion must be weighted more.
How LIME generates explanations for the model's predictions with greying out different parts of the image and seeing how prediction varies. Sources: Marco Tulio Ribeiro
A simple linear model is learned with these weights, and this approximates well in the vicinity of the observation in consideration. The parts of the observation that are weighted more are displayed, and hence with human intuition, we can see if this makes sense.
Maybe you see with LIME that a classifier for emails of atheist groups and Christian groups is learning that the word god is an important factor for classification. But you also see that it is the word god in atheist emails that is causing the classification. This is because these emails talk about god in the context that god doesnt exist. But all the model is learning is the presence of the word god in the email, and not in the context. It is learning If(god is present): classify atheist, which we as humans can immediately identify as the wrong basis for classification. We can then make our training samples more diverse and make the necessary changes so our model picks up on the right things.
To summarise, let’s break apart LIME - “Local Interpretable Model-Agnostic Explanations”
from the bottom.
Explanations refer to the fact that we can explain the actions of our model.
Model-Agnostic means that LIME looks at every model as a black box and is indifferent to what kind of model you use.
Interpretable means that the explanations provided are simple for anyone to understand.
Local refers to LIME’s explanation of the prediction of a model in the vicinity or locality of a particular sample with the help of a simple model in that region.
Seems perfect, but there is still a lot of work to be done. It is useful when you're working with patterns that align with human intuition in texts and images. When you're looking at things like protein sequences, it is not very easy to understand the explanations. But this is still a great step towards model transparency and interpretability. LIME for me seems to be one of the greatest resources out there and I would recommend it to everyone.
Posted on September 14, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
October 22, 2024