Session Ready
Exercise

Multiple text columns

In this exercise, you will continue working with the airline Twitter data. A data set tweets has been imported for you.

In some situations, you might have more than one text column in a dataset and you might want to create a numeric representation for each of the text columns. Here, besides the text column, which contains the body of the tweet, there is a second text column, called negativereason. It contains the reason the customer left a negative review.

Your task is to build BOW representations for both columns and specify the required stop words.

Instructions
100 XP
  • Import the vectorizer package and the default list of English stop words.
  • Update the default list of English stop words and create the my_stop_words set.
  • Specify the stop words argument in the first vectorizer to the updated set, and in the second vectorizer - the default set of English stop words.