Using list of terms
Oftentimes you don't want to search on just one term. You probably can create a full "fraud dictionary" of terms that could potentially flag fraudulent clients and/or transactions. Fraud analysts often will have an idea what should be in such a dictionary. In this exercise you're going to flag a multitude of terms, and in the next exercise you'll create a new flag variable out of it. The 'flag' can be used either directly in a machine learning model as a feature, or as an additional filter on top of your machine learning model results. Let's first use a list of terms to filter our data on. The dataframe containing the cleaned emails is again available as df
.
This exercise is part of the course
Fraud Detection in Python
Exercise instructions
- Create a list to search for including 'enron stock', 'sell stock', 'stock bonus', and 'sell enron stock'.
- Join the string terms in the search conditions.
- Filter data using the emails that match with the list defined under
searchfor
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Create a list of terms to search for
searchfor = ['____', '____', '____', '____']
# Filter cleaned emails on searchfor list and select from df
filtered_emails = df.____[____['_____'].____._____('|'.join(____), na=False)]
print(filtered_emails)