Word search with dataframes
In this exercise you're going to work with text data, containing emails from Enron employees. The Enron scandal is a famous fraud case. Enron employees covered up the bad financial position of the company, thereby keeping the stock price artificially high. Enron employees sold their own stock options, and when the truth came out, Enron investors were left with nothing. The goal is to find all emails that mention specific words, such as "sell enron stock".
By using string operations on dataframes, you can easily sift through messy email data and create flags based on word-hits. The Enron email data has been put into a dataframe called df
so let's search for suspicious terms. Feel free to explore df
in the Console before getting started.
This exercise is part of the course
Fraud Detection in Python
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Find all cleaned emails that contain 'sell enron stock'
mask = df['clean_content'].____.____('____', na=False)