Get startedGet started for free

Explainability in linear models

1. Explainability in linear models

Let's explore model-specific explainability techniques for linear models, commonly used in AI for both regression and classification tasks.

2. Linear models

The most well-known linear models include linear regression, which predicts continuous values, like estimating a mobile phone's price based on its storage capacity,

3. Linear models

and logistic regression, used for binary classification tasks like determining if a transaction is fraudulent based on its amount and the time it was processed.

4. Why are linear models explainable?

Linear models operate on a fundamental principle: they learn a linear combination of the input features to make predictions. For example, in 2D, linear and logistic regression use a line to relate input features through coefficients. Linear regression aims for this line to closely fit the data, while logistic regression seeks to separate two classes optimally. In both models, a unique coefficient is assigned to each feature to facilitate prediction.

5. Coefficients

These coefficients tell us the importance of each feature in the model; a higher absolute coefficient value indicates a more significant feature, whereas a lower absolute coefficient value suggests that the feature has less impact. For instance, in a model predicting the price of a mobile phone based on storage capacity and the number of camera lenses, if storage has a coefficient of 5 and the number of camera lenses has a coefficient of 2, the model suggests that storage plays a more significant role in determining the price of the phone. When comparing coefficients, we focus on their absolute values; for instance, coefficients of -5 and 5 have equal importance for the model, even though they affect the target variable in opposite directions. It’s crucial, however, to normalize features to have a similar scale before training to ensure coefficients can be compared fairly.

6. Coefficients

In the same model predicting mobile phone prices, if we don’t normalize the features, storage capacity, measured in hundreds to thousands of gigabytes, and the number of camera lenses, ranging from one to four, are on different scales, making it impossible to compare their coefficients fairly.

7. Admissions

Let's test this on the admissions dataset. We will use the same data for simplicity, although in practice, the choice between regression and classification should depend on the specific problem and dataset characteristics. Assume the features are in X_train, and we have two labels: y_reg containing the chance of acceptance, used for regression, and y_cls indicating acceptance or rejection, used for classification.

8. Model training

We normalize the data using a MinMaxScaler by applying the fit_transform() method to ensure features have similar scales. Then, we train a linear regression model on the regression data and a logistic regression model on the classification data.

9. Computing coefficients

For both models, we examine the learned coefficients using the .coef_ attribute. Each model returns one coefficient for each feature, but linear regression returns them in a one-dimensional array, while logistic regression returns them in a two-dimensional array.

10. Visualizing coefficients

We can visualize the coefficients using matplotlib by passing the feature names from X_train.columns and the coefficients array to plt.bar. For logistic regression, we extract the first index from the 2D array to plot the coefficients. The visualizations reveal that CGPA is the most influential factor in graduate admissions, aligning with expectations as cumulative grade point average crucially reflects a student’s academic performance.

11. Let's practice!

Time to practice!