CommencerCommencer gratuitement

Airline sentiment with stop words

You are given a dataset, called tweets, which contains customers' reviews and sentiments about airlines. It consists of two columns: airline_sentiment and text where the sentiment can be positive, negative or neutral, and the text is the text of the tweet.

In this exercise, you will create a BOW representation but will account for the stop words. Remember that stop words are not informative and you might want to remove them. That will result in a smaller vocabulary and eventually, fewer features. Keep in mind that we can enrich a default list of stop words with ones that are specific to our context.

Cet exercice fait partie du cours

Sentiment Analysis in Python

Afficher le cours

Instructions

  • Import the default list of English stop words.
  • Update the default list of stop words with the given list ['airline', 'airlines', '@'] to create my_stop_words.
  • Specify the stop words argument in the vectorizer.

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

# Import the stop words
from sklearn.feature_extraction.text import CountVectorizer, ____

# Define the stop words
my_stop_words = ____.____(['airline', 'airlines', '@'])

# Build and fit the vectorizer
vect = CountVectorizer(____=my_stop_words)
vect.fit(tweets.text)

# Create the bow representation
X_review = vect.transform(tweets.text)
# Create the data frame
X_df = pd.DataFrame(X_review.toarray(), columns=vect.get_feature_names())
print(X_df.head())
Modifier et exécuter le code