Comparing logistic regression and SVM (and beyond)

1. Comparing logistic regression and SVM

In this video we'll compare our two linear classifiers, logistic regression and SVMs.

2. Pros and cons

Let's summarize the points we've covered throughout the course. Both logistic regression and SVM are linear classifiers. Both can be used with kernels, but this is more common with SVMs because the predictions are much faster when the number of support vectors is small. While both can be coerced to output probabilities, this is much more natural with logistic regression. Both can be extended to multi-class with a one-vs-rest scheme or by directly modifying the loss. In logistic regression, like most methods, all data points affect the fit. SVMs have the special property that only a subset of the examples are "support vectors" and the rest can be removed without affecting the fit. While the term "logistic regression" doesn't refer to a particular type of regularization, the term "SVM" typically refers to a linear classifier that uses the hinge loss and L2 regularization.

3. Use in scikit-learn

Let's compare the use of our two methods in scikit-learn. Logistic regression is imported via sklearn.linear_model.LogisticRegression. And what are its key hyperparameters? The first is C, which controls the amount of regularization: smaller C means more regularization and vice versa. The next is the type of regularization: scikit-learn supports L2 and L1. We also discussed methods for extending a binary classifier to multi-class. This hyperparameter will probably affect your results less than the previous two. There are a bunch more hyperparameters that scikit-learn exposes, but these are the ones I consider the most fundamental. Don't be afraid to read about the rest! As for SVMs, they can be instantiated from sklearn.svm using either LinearSVC for a linear SVM or SVC for a kernel SVM. (You can actually also fit linear SVMs via the SVC class by setting the kernel to linear, but you may find LinearSVC to be faster.)

4. Use in scikit-learn (cont.)

The key hyperparameters of the SVC class are C, just like with logistic regression. the type of kernel - we only talked about linear and RBF, but scikit learn supports a couple others. and gamma, which only applies to the RBF kernel and controls the smoothness. Smaller values of gamma lead to smoother, or simpler, decision boundaries, and larger values of gamma lead to more complex boundaries. As with LogisticRegression, there are certainly more hyperparameters and I encourage you to check them out.

5. SGDClassifier

Finally, I want to direct your attention to scikit-learn's SGDClassifier. SGD stands for stochastic gradient descent. Although we didn't cover SGD in this course, it's worth knowing about SGDClassifier, since it can handle very large datasets much better than the other methods we've discussed. We've been talking about how logistic regression and SVM are just two types of linear classifiers, and SGDClassifier really brings this point home. In fact, to switch between logistic regression and a linear SVM, one only has to set the loss hyperparameter of the SGDClassifier. It's just like we said: the model is the same, and only the loss changes. SGDClassifier works pretty much like the other scikit-learn methods we've seen. One "gotcha" with SGDClassifier is that the regularization hyperparameter is called alpha instead of C, and bigger alpha means more regularization. Basically, alpha is the inverse of C.

6. Let's practice!

Now it's your turn.