Evaluating multiple models

1. Evaluating multiple models

We've covered all parts of the supervised learning workflow. But how do we decide which model to use in the first place?

2. Different models for different problems

This is a complex question, and the answer depends on our situation. However, there are some principles that can guide us when making this decision. The size of our dataset plays a role. Fewer features means a simpler model, and can reduce training time. Also, some models, such as Artificial Neural Networks, require a lot of data to perform well. We may need an interpretable model, so we can explain to stakeholders how predictions were made. An example is linear regression, where we can calculate and interpret the model coefficients. Alternatively, flexibility might be important to get the most accurate predictions. Generally, flexible models make fewer assumptions about the data; for example, a KNN model does not assume a linear relationship between the features and the target.

3. It's all in the metrics

Notice that scikit-learn allows the same methods to be used for most models. This makes it easy to compare them! Regression models can be evaluated using the root mean squared error, or the R-squared value. Likewise, classification models can all be analyzed using accuracy, a confusion matrix and its associated metrics, or the ROC AUC. Therefore, one approach is to select several models and a metric, then evaluate their performance without any form of hyperparameter tuning.

4. A note on scaling

Recall that the performance of some models, such as KNN, linear regression, and logistic regression, are affected by scaling our data. Therefore, it is generally best to scale our data before evaluating models out of the box.

5. Evaluating classification models

We will evaluate three models for binary classification of song genre: KNN, logistic regression, and a new model called a decision tree classifier. We import our required modules, including DecisionTreeClassifier from sklearn-dot-tree. The workings of decision trees are outside the scope of this course, but the steps for building this model are the same as for other models in scikit-learn. As usual, we create our feature and target arrays, then split our data. We then scale our features using the scaler's dot-fit_transform method on the training set, and the dot-transform method on the test set.

6. Evaluating classification models

We create a dictionary with our model names as strings for the keys, and instantiate models as the dictionary's values. We also create an empty list to store the results. Now we loop through the models in our models dictionary, using its dot-values method. Inside the loop, we instantiate a KFold object. Next we perform cross-validation, using the model being iterated, along with our scaled training features, and target training array. We set cv equal to our kfold variable. By default, the scoring here will be accuracy. We then append the cross-validation results to our results list. Lastly, outside of the loop, we create a boxplot of our results, and set the labels argument equal to a call of models-dot-keys to retrieve each model's name.

7. Visualizing results

The output shows us the range of cross-validation accuracy scores. We can also see each model's median cross-validation score, represented by the orange line in each box. We can see logistic regression has the best median score.

8. Test set performance

To evaluate on the test set we loop through the names and values of the dictionary using the dot-items method. Inside the loop we fit the model, calculate accuracy, and print it. Logistic regression performs best for this problem if we are using accuracy as the metric.

9. Let's practice!

Now let's choose which models to optimize for our supervised learning problems.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.