Interpretable Machine Learning — A Short Survey

6 min readJan 6, 2021

Hello everyone, this is my first story on Medium. So pardon me if I make some minor errors. Today I am going to talk about a new research paper “Interpretable Machine Learning — A Brief History, State-of-the-Art and Challenges”. Recent conferences have been dominated by papers in IML(Interpretable Machine Learning), but the field has roots as old as 200 years. The biggest challenge is a missing rigorous definition of interpretability, which is accepted by the community.

Introduction

One of the most common question in the industry when deploying an ML model is whether the model is trustworthy. I had many questions during my short stint in the industry about how does the model makes decisions/predictions. Some common questions are how important is a particular feature or why did the model make this particular prediction. Interpretability is often a huge factor when a particular ML model is deployed in areas like security, customer service, etc. Also, IML can be used to debug or justify the model and its prediction and to further improve the model.

History

IML research peaked with the advancement of deep learning. But, the roots of IML are very old. Linear regression models were used by Gauss, Legendre, and Quetelet as early as the beginning of the 19th century and have since then grown into a vast array of regression analysis tools like Generalized Additive models. These models make certain distributional assumptions or restrict the model complexity beforehand therefore imposing intrinsic interpretability of the model.

In contrast to this, ML algorithms follow a non-linear, non-parametric approach, where model complexity is controlled by hyperparameters or through Cross-validation. As a result, ML models have very good predictive performance but poor interpretability.

Even though interpretability in ML has been under-explored, there have been some prominent work done in IML. The built-in “feature importance” measure of random forests was one of the important IML milestones. “Recently, many model-agnostic explanation methods have been introduced, which work for different types of ML models. But also model-specific explanation methods have been developed, for example, to interpret deep neural networks or tree ensembles. Regression analysis and rule-based ML remain important and active research areas to this day and are blending together e.g Model-based trees, RuleFit, etc.” Both regression models and rule-based ML serve as stand-alone ML algorithms and also as building blocks for IML approaches.

IML Models

IML models are distinguished by whether they analyze model components, model sensitivity, or surrogate models. Some IML approaches work by assigning meaning to individual model components (left), some by analyzing the model predictions for perturbations of the data (right). The surrogate approach, a mixture of the two other approaches, approximates the ML model using (perturbed) data and then analyzes the components of the interpretable surrogate model.

Molnar et al. “***Interpretable Machine Learning — A Brief History, State-of-the-Art and Challenges***”

(1) Analyzing Components of Interpretable Models:

In these approaches, we focus on individual components of models instead of the whole model. So, it is not necessarily required to understand the entire model but in order to analyze specific components of a model, it needs to be decomposable into parts that we can interpret individually. Interpretable models are models with learned parameters and learned structures that can be assigned a certain interpretation just like linear regression, decision trees, etc. For example, the weights of the linear regression model can be interpreted as the effect of individual features on model prediction. Decision trees have a learned structure where at each node there is a division based on a particular feature. This helps us in tracing a prediction made by decision trees by following nodes. But, this works only up to a certain point in high dimensional cases. Linear regression models with a very high number of features are not that interpretable anymore and so some approaches try to reduce the features to be interpreted e.g LASSO.

The Lasso Page

The Lasso Page L1-constrained fitting for statistics and data mining The Lasso is a shrinkage and selection method for…

statweb.stanford.edu

(2) Analyzing Components of More Complex Models:

We can also analyze components of a complex model. For e.g, we can visualize the features learned by CNN layers.

Feature Visualization

There is a growing sense that neural networks need to be interpretable to humans. The field of neural network…

distill.pub

Some approaches try to make components more interpretable by introducing some monotonicity constraint or a modified loss function for disentangling concepts learned by CNN.

(3) Explaining Individual Predictions:

Most of the approaches used for the study of the sensitivity of an ML model are model-agnostic. They work by analyzing the model predictions for small perturbations in input data. We distinguish between local and global explanations. Local methods focus on individual model prediction. One of the popular local IML methods is Shapley values.

5.9 Shapley Values | Interpretable Machine Learning

A prediction can be explained by assuming that each feature value of the instance is a "player" in a game where the…

christophm.github.io

Some IML approaches rely on model-specific knowledge to analyze how a change in input features affect the output. For e.g, Saliency maps are used for CNNs. A saliency map produces a heatmap showing how changing a pixel changes prediction.

Visualizing Your Convolutional Neural Network Predictions With Saliency Maps

In many cases, understanding why the model predicted a given outcome is a key detail for model users and a necessary…

medium.com

(4) Explaining Global Model Behaviour:

Global Model explanation methods are used to explain how the model behaves on an average for a given specific dataset. A useful distinction of global explanations is “feature importance” and “feature effect”.

Feature importance ranks features on how important/relevant they are for prediction e.g Permutation feature importance for the Random forest.

5.5 Permutation Feature Importance | Interpretable Machine Learning

Permutation feature importance measures the increase in the prediction error of the model after we permuted the…

christophm.github.io

The “feature effect” shows how a change in an input feature changes the predicted outcome. Popular feature effect plots are partial dependence plots, individual conditional expectation curves, accumulated local effect plots, and the functional ANOVA.

5.1 Partial Dependence Plot (PDP) | Interpretable Machine Learning

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted…

christophm.github.io

5.3 Accumulated Local Effects (ALE) Plot | Interpretable Machine Learning

Accumulated local effects describe how features influence the prediction of a machine learning model on average. ALE…

christophm.github.io

(5) Surrogate Models:

Surrogate models are interpretable models that try to mimic the behavior of relevant ML models. The Surrogate approach only requires input and output data of the ML model to train a surrogate ML model. They treat the ML model as a black-box. LIME is an example of a local surrogate method that explains individual predictions by learning an interpretable model with data in proximity to the data point to be explained.

5.7 Local Surrogate (LIME) | Interpretable Machine Learning

Local surrogate models are interpretable models that are used to explain individual predictions of black box machine…

christophm.github.io

Challenges

Some of the challenges in the field of IML are:

(1) A lack of proper definition for interpretability.

(2) Feature dependence introduces problems with attribution of importance and extrapolation.

(3) Many IML methods provide explanations without quantifying the uncertainty of the explanation. The model itself, but also its explanations, are computed from data and hence are subject to uncertainty.

(4) As mentioned in the paper, “Ideally, a model should reflect the true causal structure of its underlying phenomena, to enable causal interpretations. Arguably, causal interpretation is usually the goal of modeling if ML is used in science. But most statistical learning procedures reflect mere correlation structures between features and analyze the surface of the data generation process instead of its true inherent structure. Further research is needed to understand when we are allowed to make causal interpretations of an ML model.”

Interpretable Machine Learning — A Short Survey

Introduction

History

IML Models

(1) Analyzing Components of Interpretable Models:

The Lasso Page

The Lasso Page L1-constrained fitting for statistics and data mining The Lasso is a shrinkage and selection method for…

(2) Analyzing Components of More Complex Models:

Feature Visualization

There is a growing sense that neural networks need to be interpretable to humans. The field of neural network…

(3) Explaining Individual Predictions:

5.9 Shapley Values | Interpretable Machine Learning

A prediction can be explained by assuming that each feature value of the instance is a "player" in a game where the…

Visualizing Your Convolutional Neural Network Predictions With Saliency Maps

In many cases, understanding why the model predicted a given outcome is a key detail for model users and a necessary…

(4) Explaining Global Model Behaviour:

5.5 Permutation Feature Importance | Interpretable Machine Learning

Permutation feature importance measures the increase in the prediction error of the model after we permuted the…

5.1 Partial Dependence Plot (PDP) | Interpretable Machine Learning

The partial dependence plot (short PDP or PD plot) shows the marginal effect one or two features have on the predicted…

5.3 Accumulated Local Effects (ALE) Plot | Interpretable Machine Learning

Accumulated local effects describe how features influence the prediction of a machine learning model on average. ALE…

(5) Surrogate Models:

5.7 Local Surrogate (LIME) | Interpretable Machine Learning

Local surrogate models are interpretable models that are used to explain individual predictions of black box machine…

Challenges

Written by Maulik Parmar