Get startedGet started for free

Sentiment analysis with GBM

Let's now use scikit-learn's GradientBoostingClassifier on the reviews dataset to predict the sentiment of a review given its text.

We will not pass the raw text as input for the model. The following pre-processing has been done for you:

  1. Remove reviews with missing values.
  2. Select data from the top 5 apps.
  3. Select a random subsample of 500 reviews.
  4. Remove "stop words" from the reviews.
  5. Transform the reviews into a matrix, in which each feature represents the frequency of a word in a review.

Do you want a deeper understanding of text mining? Then go check the course Introduction to Natural Language Processing in Python!

This exercise is part of the course

Ensemble Methods in Python

View Course

Exercise instructions

  • Build a GradientBoostingClassifier with 100 estimators and a learning rate of 0.1.
  • Calculate the predictions on the test set.
  • Compute the accuracy to evaluate the model.
  • Calculate and print the confusion matrix.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Build and fit a Gradient Boosting classifier
clf_gbm = ____(____, ____, random_state=500)
clf_gbm.fit(X_train, y_train)

# Calculate the predictions on the test set
pred = ____

# Evaluate the performance based on the accuracy
acc = ____
print('Accuracy: {:.3f}'.format(acc))

# Get and show the Confusion Matrix
cm = ____
print(cm)
Edit and Run Code