Get startedGet started for free

Airline sentiment with stop words

You are given a dataset, called tweets, which contains customers' reviews and sentiments about airlines. It consists of two columns: airline_sentiment and text where the sentiment can be positive, negative or neutral, and the text is the text of the tweet.

In this exercise, you will create a BOW representation but will account for the stop words. Remember that stop words are not informative and you might want to remove them. That will result in a smaller vocabulary and eventually, fewer features. Keep in mind that we can enrich a default list of stop words with ones that are specific to our context.

This exercise is part of the course

Sentiment Analysis in Python

View Course

Exercise instructions

  • Import the default list of English stop words.
  • Update the default list of stop words with the given list ['airline', 'airlines', '@'] to create my_stop_words.
  • Specify the stop words argument in the vectorizer.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import the stop words
from sklearn.feature_extraction.text import CountVectorizer, ____

# Define the stop words
my_stop_words = ____.____(['airline', 'airlines', '@'])

# Build and fit the vectorizer
vect = CountVectorizer(____=my_stop_words)
vect.fit(tweets.text)

# Create the bow representation
X_review = vect.transform(tweets.text)
# Create the data frame
X_df = pd.DataFrame(X_review.toarray(), columns=vect.get_feature_names())
print(X_df.head())
Edit and Run Code