BaşlayınÜcretsiz Başlayın

Sentiment analysis with GBM

Let's now use scikit-learn's GradientBoostingClassifier on the reviews dataset to predict the sentiment of a review given its text.

We will not pass the raw text as input for the model. The following pre-processing has been done for you:

  1. Remove reviews with missing values.
  2. Select data from the top 5 apps.
  3. Select a random subsample of 500 reviews.
  4. Remove "stop words" from the reviews.
  5. Transform the reviews into a matrix, in which each feature represents the frequency of a word in a review.

Do you want a deeper understanding of text mining? Then go check the course Introduction to Natural Language Processing in Python!

Bu egzersiz

Ensemble Methods in Python

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Build a GradientBoostingClassifier with 100 estimators and a learning rate of 0.1.
  • Calculate the predictions on the test set.
  • Compute the accuracy to evaluate the model.
  • Calculate and print the confusion matrix.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Build and fit a Gradient Boosting classifier
clf_gbm = ____(____, ____, random_state=500)
clf_gbm.fit(X_train, y_train)

# Calculate the predictions on the test set
pred = ____

# Evaluate the performance based on the accuracy
acc = ____
print('Accuracy: {:.3f}'.format(acc))

# Get and show the Confusion Matrix
cm = ____
print(cm)
Kodu Düzenle ve Çalıştır