Get startedGet started for free

Performance metrics of Twitter data

You will train a logistic regression model that predicts the sentiment of tweets and evaluate its performance on the test set using different metrics.

A matrix X has been created for you. It contains features created with a BOW on the text column.

The labels are stored in a vector called y. Vector y is 0 for negative tweets, 1 for neutral, and 2 for positive ones.
Note that although we have 3 classes, it is still a classification problem. The accuracy still measures the proportion of correctly predicted instances. The confusion matrix will now be of size 3x3, each row will give the number of predicted cases for classes 2, 1, and 0, and each column - the true number of cases in class 2, 1, and 0.

All required packages have been imported for you.

This exercise is part of the course

Sentiment Analysis in Python

View Course

Exercise instructions

  • Perform the train/test split, and stratify by y.
  • Train a a logistic regression classifier.
  • Predict the performance on the test set.
  • Print the accuracy score and confusion matrix obtained on the test set.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = ____(X, y, test_size=0.3, random_state=123, ____=y)

# Train a logistic regression
log_reg = ____.____(___, ____)

# Make predictions on the test set
y_predicted = log_reg.____(___)

# Print the performance metrics
print('Accuracy score test set: ', ____(y_test, y_predicted))
print('Confusion matrix test set: \n', ____(y_test, y_predicted)/len(y_test))
Edit and Run Code