ComenzarEmpieza gratis

Sentiment analysis with GBM

Let's now use scikit-learn's GradientBoostingClassifier on the reviews dataset to predict the sentiment of a review given its text.

We will not pass the raw text as input for the model. The following pre-processing has been done for you:

  1. Remove reviews with missing values.
  2. Select data from the top 5 apps.
  3. Select a random subsample of 500 reviews.
  4. Remove "stop words" from the reviews.
  5. Transform the reviews into a matrix, in which each feature represents the frequency of a word in a review.

Do you want a deeper understanding of text mining? Then go check the course Introduction to Natural Language Processing in Python!

Este ejercicio forma parte del curso

Ensemble Methods in Python

Ver curso

Instrucciones del ejercicio

  • Build a GradientBoostingClassifier with 100 estimators and a learning rate of 0.1.
  • Calculate the predictions on the test set.
  • Compute the accuracy to evaluate the model.
  • Calculate and print the confusion matrix.

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

# Build and fit a Gradient Boosting classifier
clf_gbm = ____(____, ____, random_state=500)
clf_gbm.fit(X_train, y_train)

# Calculate the predictions on the test set
pred = ____

# Evaluate the performance based on the accuracy
acc = ____
print('Accuracy: {:.3f}'.format(acc))

# Get and show the Confusion Matrix
cm = ____
print(cm)
Editar y ejecutar código