Get startedGet started for free

XGBoost: Fit/Predict

It's time to create your first XGBoost model! As Sergey showed you in the video, you can use the scikit-learn .fit() / .predict() paradigm that you are already familiar to build your XGBoost models, as the xgboost library has a scikit-learn compatible API!

Here, you'll be working with churn data. This dataset contains imaginary data from a ride-sharing app with user behaviors over their first month of app usage in a set of imaginary cities as well as whether they used the service 5 months after sign-up. It has been pre-loaded for you into a DataFrame called churn_data - explore it in the Shell!

Your goal is to use the first month's worth of data to predict whether the app's users will remain users of the service at the 5 month mark. This is a typical setup for a churn prediction problem. To do this, you'll split the data into training and test sets, fit a small xgboost model on the training set, and evaluate its performance on the test set by computing its accuracy.

pandas and numpy have been imported as pd and np, and train_test_split has been imported from sklearn.model_selection. Additionally, the arrays for the features and the target have been created as X and y.

This exercise is part of the course

Extreme Gradient Boosting with XGBoost

View Course

Exercise instructions

  • Import xgboost as xgb.
  • Create training and test sets such that 20% of the data is used for testing. Use a random_state of 123.
  • Instantiate an XGBoostClassifier as xg_cl using xgb.XGBClassifier(). Specify n_estimators to be 10 estimators and an objective of 'binary:logistic'. Do not worry about what this means just yet, you will learn about these parameters later in this course.
  • Fit xg_cl to the training set (X_train, y_train) using the .fit() method.
  • Predict the labels of the test set (X_test) using the .predict() method and hit 'Submit Answer' to print the accuracy.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import xgboost
____

# Create arrays for the features and the target: X, y
X, y = churn_data.iloc[:,:-1], churn_data.iloc[:,-1]

# Create the training and test sets
X_train, X_test, y_train, y_test= ____(____, ____, test_size=____, random_state=123)

# Instantiate the XGBClassifier: xg_cl
xg_cl = ____.____(____='____', ____=____, seed=123)

# Fit the classifier to the training set
____

# Predict the labels of the test set: preds
preds = ____

# Compute the accuracy: accuracy
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
print("accuracy: %f" % (accuracy))
Edit and Run Code