Exercise

Performance metrics of Twitter data

You will train a logistic regression model that predicts the sentiment of tweets and evaluate its performance on the test set using different metrics.

A matrix X has been created for you. It contains features created with a BOW on the text column.

The labels are stored in a vector called y. Vector y is 0 for negative tweets, 1 for neutral, and 2 for positive ones.
Note that although we have 3 classes, it is still a classification problem. The accuracy still measures the proportion of correctly predicted instances. The confusion matrix will now be of size 3x3, each row will give the number of predicted cases for classes 2, 1, and 0, and each column - the true number of cases in class 2, 1, and 0.

All required packages have been imported for you.

Instructions

100 XP
  • Perform the train/test split, and stratify by y.
  • Train a a logistic regression classifier.
  • Predict the performance on the test set.
  • Print the accuracy score and confusion matrix obtained on the test set.