Supervised machine learning
1. Supervised machine learning
Hi, I'm Ramnath Vaidyanathan, VP of Product Research at DataCamp. In this chapter, we'll dive into machine learning, the buzziest topic in data science! Let's start with supervised machine learning.2. What is supervised machine learning?
As we learned previously, machine learning is a set of methods for making predictions based on existing data. Supervised machine learning is a subset of machine learning methods where the existing data has a specific structure: it has labels and features. Some problems that can be solved by supervised machine learning include recommendation systems, email subject optimization, and churn prediction. Let's explore one of these problems and define these new terms with a case study.3. Case study: churn prediction
Suppose we have a subscription business and want to predict whether a given customer is likely to stay subscribed or churn.4. Case study: churn prediction
First, we'll need some training data. This would be historical data from our customers.5. Case study: churn prediction
Some of those customers will have maintained their subscription, while others will have churned. We eventually want to be able to predict the label for each customer: churned or subscribed.6. Case study: churn prediction
We'll need features to make this prediction. Features are different pieces of information about each customer that might affect our label. For example, perhaps age, gender, the date of last purchase, or household income will predict cancellations. The magic of machine learning is that we can analyze many features all at once. We can use these labels and features to train a model to make predictions on new data.7. Case study: churn prediction
Suppose we have a customer who may or may not churn soon. We can collect feature data on this customer, such as age, or date of last purchase.8. Case study: churn prediction
We can feed this data into our trained model9. Case study: churn prediction
and then, our trained model will give us a prediction. If the customer is not in danger of churning, we can count on their revenue for another month! If they are in danger of churning, we can reach out to them with a special promotion or customer support to keep them subscribed.10. Recap
Let's recap. Machine learning makes a prediction based on data. In supervised machine learning, that data has two characteristics: features and labels. Labels are the quantity that we want to predict, in our example, whether or not the customer has churned. Features are data that might help predict the label, such as household income or date of last purchase. Once we have the features and labels, we can train a model and use it to make predictions on new data.11. Model evaluation
Once we train a supervised machine learning model, how do we know if it's any good? When we collect historical data to train our model, it is always good practice to not feed all of it into our model. This withheld data is called a test set and can be used to evaluate the goodness of the model. In our example, we could ask the model to predict whether a set of customers would churn, and then measure how often the predictions were accurate.12. Model evaluation
It's important to note how often the model incorrectly predicted that a customer would churn and how often it incorrectly predicted that a customer wouldn't churn. Checking both outcomes is particularly important for rare events. Suppose our subscription is amazing and only 3 percent of customers ever cancel. Then, a model could be overall 97 percent accurate just by always predicting that a customer will remain subscribed. Only by examining the accuracy of each class do we realize that it had 0 percent accuracy at predicting churn when churn was the actual outcome.13. Let's practice!
You learned how supervised learning uses data with features and labels to make predictions and how to measure model accuracy. Let's practice!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.