Get Started

Creating a flag

This time you are going to create an actual flag variable that gives a 1 when the emails get a hit on the search terms of interest, and 0 otherwise. This is the last step you need to make in order to actually use the text data content as a feature in a machine learning model, or as an actual flag on top of model results. You can continue working with the dataframe df containing the emails, and the searchfor list is the one defined in the last exercise.

This is a part of the course

“Fraud Detection in Python”

View Course

Exercise instructions

  • Use a numpy where condition to flag '1' where the cleaned email contains words on the searchfor list and 0 otherwise.
  • Join the words on the searchfor list with an "or" indicator.
  • Count the values of the newly created flag variable.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Create flag variable where the emails match the searchfor terms
df['flag'] = ____.____((df['clean_content'].___.____('____'.____(____)) == True), 1, 0)

# Count the values of the flag variable
count = df['flag'].____()
print(count)

This exercise is part of the course

Fraud Detection in Python

IntermediateSkill Level
4.2+
5 reviews

Learn how to detect fraud using Python.

In this final chapter, you will use text data, text mining, and topic modeling to detect fraudulent behavior.

Exercise 1: Using text dataExercise 2: Word search with dataframesExercise 3: Using list of termsExercise 4: Creating a flag
Exercise 5: Text mining to detect fraudExercise 6: Removing stopwordsExercise 7: Cleaning text dataExercise 8: Topic modeling on fraudExercise 9: Create dictionary and corpusExercise 10: LDA modelExercise 11: Flagging fraud based on topicsExercise 12: Interpreting the topic modelExercise 13: Finding fraudsters based on topicExercise 14: Recap

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free