Performance metrics of Twitter data

You will train a logistic regression model that predicts the sentiment of tweets and evaluate its performance on the test set using different metrics.

A matrix X has been created for you. It contains features created with a BOW on the text column.

The labels are stored in a vector called y. Vector y is 0 for negative tweets, 1 for neutral, and 2 for positive ones.
Note that although we have 3 classes, it is still a classification problem. The accuracy still measures the proportion of correctly predicted instances. The confusion matrix will now be of size 3x3, each row will give the number of predicted cases for classes 2, 1, and 0, and each column - the true number of cases in class 2, 1, and 0.

All required packages have been imported for you.

Bu egzersiz

Sentiment Analysis in Python

kursunun bir parçasıdır

Kursu Görüntüle

Egzersiz talimatları

Perform the train/test split, and stratify by y.
Train a a logistic regression classifier.
Predict the performance on the test set.
Print the accuracy score and confusion matrix obtained on the test set.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = ____(X, y, test_size=0.3, random_state=123, ____=y)

# Train a logistic regression
log_reg = ____.____(___, ____)

# Make predictions on the test set
y_predicted = log_reg.____(___)

# Print the performance metrics
print('Accuracy score test set: ', ____(y_test, y_predicted))
print('Confusion matrix test set: \n', ____(y_test, y_predicted)/len(y_test))

Kodu Düzenle ve Çalıştır