Get startedGet started for free

Making Predictions

1. Making Predictions

Welcome to Chapter 3! Having completed exploring and preprocessing your data, it's now time to build your classifier and make predictions!

2. (Supervised) Machine Learning Primer

Recall that our goal is to predict whether or not a customer will churn based on various features. Here, since we have a clearly defined target variable - 'Churn' - we will be using supervised machine learning techniques to make predictions. We have historical data that contains information about whether or not a given customer churned. The goal of our models is to learn from this data so that we can make predictions whether or not new customers will churn. This historical data is known as the training data.

3. (Supervised) Machine Learning in Python

There are many ways to perform supervised learning in Python. In this course, we will use scikit-learn, or sklearn, one of the most popular machine learning libraries out there. Let's get started!

4. Model selection

First, we need to decide which model we want to use. This is often one of the most difficult questions faced by data scientists, and the answer often is, it depends. In this course, you will experiment with a few different models to predict customer churn. We will not explore the details of these models, as DataCamp offers additional courses that will provide you with a deeper understanding of how they work under the hood.

5. Model selection

For classification problems, a good baseline model to begin with is logistic regression. It offers simplicity and interpretability. However, it is not flexible enough to capture more complex relationships in your dataset. Random Forests are a good next step - they have high performance but offer limited interpretability. Support Vector Machines are another option. They generally perform well, but are inefficient trainers and are not very interpretable. You'll have a chance to try all these models in the exercises. In this video, let's try out a Support Vector Machine classifier.

6. Training your model

Which model you decide to use will affect which functions you import from sklearn. To build our Support Vector Machine classifier, we need to first import the SVC class from sklearn dot svm. This SVC class implements the support vector machine algorithm for learning from the data and making predictions. We then need to instantiate the SVC classifier, as shown here. Next, we need to train our model on our churn data. This is also known as fitting our model to the data. In sklearn, we use the fit method to do this. The first argument it takes in is the feature array, and the second argument is the target variable - that is, whether or not the customer churned. These two arguments must be either NumPy arrays or pandas DataFrames, and the features must be continuous values, such as the number of customer service calls made by a customer, as opposed to categories, such as 'State'. The data you'll be working with has already been preprocessed to satisfy these conditions. Have a look at the what is returned when you fit the classifier: It is the classifier itself, after being modified to fit the data. You'll have a chance to play around with these parameters when tuning your models later in this course.

7. Making a prediction

After fitting your model you may want to see the model's prediction for a new customer. To do this, we use the predict method, and pass in the features of our new customer. Printing the result, we see 0, which indicates that the customer has not churned. Good news for the business!

8. Let's practice!

Alright, it's now time for you to train your own model. Have fun in the exercises!