Bias-variance tradeoff

1. Bias-variance tradeoff

To wrap up the course, we'll go over another one of interviewers favorite topics: the bias-variance tradeoff.

2. Types of error

To start things off, let's talk a bit about error. Any time you fit a machine learning model, you're dealing with three types of error: bias, variance, and irreducible error. We won't go into too much detail about irreducible error, but you should know that the error stems from multiple sources. It could be result of how you framed the problem, your algorithm choice, or plenty of other unknown variables.

3. Bias error

Bias is the simplifying assumptions made by a model to make the target function easier to learn. In general, high bias makes algorithms faster to learn and easier to understand but less flexible. Too much bias can lead to a problem with under-fitting. This means that our model is making too many assumptions and not fitting the training data well, as we see here with our fit going straight across all the data points. Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis, and Logistic Regression.

4. Variance error

On the other hand, variance is the amount that the estimate of the target function would change if different training data was used. Some variance will exist, but, ideally, results would not change too much from one training dataset to the next. Too much variance in your model will lead to the problem of overfitting. This means that your model is too flexible, and will fit itself closely to your training data, making it not generalizable to unseen data. This we see with the fit here is curving around every point to ensure strong test performance - but this won't hold up later on. Examples of high-variance machine learning algorithms include: Decision Trees, k-Nearest Neighbors, and Support Vector Machines.

5. Bias-variance tradeoff

As one might guess, the goal of any machine learning algorithm is to minimize the error, achieving low bias and low variance, which ultimately leads to good prediction performance. However, this is easier said than done, due to an inherent tradeoff between bias and variance. Increasing the bias will decrease the variance and increasing the variance will decrease the bias. We can see this phenomenon at work in this plot with error on the y-axis and model complexity on the x-axis. The optimum model complexity falls somewhere in the middle. Keep this in mind when choosing an algorithm for your problem and data. Be sure you can explain this in simple, intuitive terms, since interviewers love to screen with this type of question.

6. Summary

To summarize, we went over the types of errors at play when performing machine learning for prediction, and learned about bias and variance and the tradeoff between these two attributes. Which do you think is more dangerous to your model and why?

7. Let's prepare for the interview!

Let's practice! You're almost there, so enjoy the last few exercises!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.