1. Learn
  2. /
  3. Courses
  4. /
  5. Fraud Detection in Python

Exercise

Removing stopwords

In the following exercises you're going to clean the Enron emails, in order to be able to use the data in a topic model. Text cleaning can be challenging, so you'll learn some steps to do this well. The dataframe containing the emails df is available. In a first step you need to define the list of stopwords and punctuations that are to be removed in the next exercise from the text data. Let's give it a try.

Instructions

100 XP
  • Import the stopwords from ntlk.
  • Define 'english' words to use as stopwords under the variable stop.
  • Get the punctuation set from the string package and assign it to exclude.