Performance metrics of Twitter data
You will train a logistic regression model that predicts the sentiment of tweets and evaluate its performance on the test set using different metrics.
A matrix X has been created for you. It contains features created with a BOW on the text column.
The labels are stored in a vector called y. Vector y is 0 for negative tweets, 1 for neutral, and 2 for positive ones.
Note that although we have 3 classes, it is still a classification problem. The accuracy still measures the proportion of correctly predicted instances. The confusion matrix will now be of size 3x3, each row will give the number of predicted cases for classes 2, 1, and 0, and each column - the true number of cases in class 2, 1, and 0.
All required packages have been imported for you.
Bu egzersiz
Sentiment Analysis in Python
kursunun bir parçasıdırEgzersiz talimatları
- Perform the train/test split, and stratify by
y. - Train a a logistic regression classifier.
- Predict the performance on the test set.
- Print the accuracy score and confusion matrix obtained on the test set.
Uygulamalı interaktif egzersiz
Bu örnek kodu tamamlayarak bu egzersizi bitirin.
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = ____(X, y, test_size=0.3, random_state=123, ____=y)
# Train a logistic regression
log_reg = ____.____(___, ____)
# Make predictions on the test set
y_predicted = log_reg.____(___)
# Print the performance metrics
print('Accuracy score test set: ', ____(y_test, y_predicted))
print('Confusion matrix test set: \n', ____(y_test, y_predicted)/len(y_test))