XGBoost: Fit/Predict
It's time to create your first XGBoost model! As Sergey showed you in the video, you can use the scikit-learn .fit() / .predict() paradigm that you are already familiar to build your XGBoost models, as the xgboost library has a scikit-learn compatible API!
Here, you'll be working with churn data. This dataset contains imaginary data from a ride-sharing app with user behaviors over their first month of app usage in a set of imaginary cities as well as whether they used the service 5 months after sign-up. It has been pre-loaded for you into a DataFrame called churn_data - explore it in the Shell!
Your goal is to use the first month's worth of data to predict whether the app's users will remain users of the service at the 5 month mark. This is a typical setup for a churn prediction problem. To do this, you'll split the data into training and test sets, fit a small xgboost model on the training set, and evaluate its performance on the test set by computing its accuracy.
pandas and numpy have been imported as pd and np, and train_test_split has been imported from sklearn.model_selection. Additionally, the arrays for the features and the target have been created as X and y.
Este ejercicio forma parte del curso
Extreme Gradient Boosting with XGBoost
Instrucciones del ejercicio
- Import 
xgboostasxgb. - Create training and test sets such that 20% of the data is used for testing. Use a 
random_stateof123. - Instantiate an 
XGBoostClassifierasxg_clusingxgb.XGBClassifier(). Specifyn_estimatorsto be10estimators and anobjectiveof'binary:logistic'. Do not worry about what this means just yet, you will learn about these parameters later in this course. - Fit 
xg_clto the training set (X_train, y_train)using the.fit()method. - Predict the labels of the test set (
X_test) using the.predict()method and hit 'Submit Answer' to print the accuracy. 
Ejercicio interactivo práctico
Prueba este ejercicio y completa el código de muestra.
# Import xgboost
____
# Create arrays for the features and the target: X, y
X, y = churn_data.iloc[:,:-1], churn_data.iloc[:,-1]
# Create the training and test sets
X_train, X_test, y_train, y_test= ____(____, ____, test_size=____, random_state=123)
# Instantiate the XGBClassifier: xg_cl
xg_cl = ____.____(____='____', ____=____, seed=123)
# Fit the classifier to the training set
____
# Predict the labels of the test set: preds
preds = ____
# Compute the accuracy: accuracy
accuracy = float(np.sum(preds==y_test))/y_test.shape[0]
print("accuracy: %f" % (accuracy))