MulaiMulai sekarang secara gratis

Sentiment analysis with GBM

Let's now use scikit-learn's GradientBoostingClassifier on the reviews dataset to predict the sentiment of a review given its text.

We will not pass the raw text as input for the model. The following pre-processing has been done for you:

  1. Remove reviews with missing values.
  2. Select data from the top 5 apps.
  3. Select a random subsample of 500 reviews.
  4. Remove "stop words" from the reviews.
  5. Transform the reviews into a matrix, in which each feature represents the frequency of a word in a review.

Do you want a deeper understanding of text mining? Then go check the course Introduction to Natural Language Processing in Python!

Latihan ini adalah bagian dari kursus

Ensemble Methods in Python

Lihat Kursus

Petunjuk latihan

  • Build a GradientBoostingClassifier with 100 estimators and a learning rate of 0.1.
  • Calculate the predictions on the test set.
  • Compute the accuracy to evaluate the model.
  • Calculate and print the confusion matrix.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Build and fit a Gradient Boosting classifier
clf_gbm = ____(____, ____, random_state=500)
clf_gbm.fit(X_train, y_train)

# Calculate the predictions on the test set
pred = ____

# Evaluate the performance based on the accuracy
acc = ____
print('Accuracy: {:.3f}'.format(acc))

# Get and show the Confusion Matrix
cm = ____
print(cm)
Edit dan Jalankan Kode