Removing titles and taking names
While collecting survey respondent metadata in the airlines
DataFrame, the full name of respondents was saved in the full_name
column. However upon closer inspection, you found that a lot of the different names are prefixed by honorifics such as "Dr."
, "Mr."
, "Ms."
and "Miss"
.
Your ultimate objective is to create two new columns named first_name
and last_name
, containing the first and last names of respondents respectively. Before doing so however, you need to remove honorifics.
The airlines
DataFrame is in your environment, alongside pandas
as pd.
This is a part of the course
“Cleaning Data in Python”
Exercise instructions
- Remove
"Dr."
,"Mr."
,"Miss"
and"Ms."
fromfull_name
by replacing them with an empty string""
in that order. - Run the
assert
statement using.str.contains()
that tests whetherfull_name
still contains any of the honorifics.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Replace "Dr." with empty string ""
airlines['full_name'] = airlines['full_name'].____.____("____","")
# Replace "Mr." with empty string ""
airlines['full_name'] = ____
# Replace "Miss" with empty string ""
____
# Replace "Ms." with empty string ""
____
# Assert that full_name has no honorifics
assert airlines['full_name'].str.contains('Ms.|Mr.|Miss|Dr.').any() == False
This exercise is part of the course
Cleaning Data in Python
Learn to diagnose and treat dirty data and develop the skills needed to transform your raw data into accurate insights!
Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature. In this chapter, you’ll learn how to fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.
Exercise 1: Membership constraintsExercise 2: Members onlyExercise 3: Finding consistencyExercise 4: Categorical variablesExercise 5: Categories of errorsExercise 6: Inconsistent categoriesExercise 7: Remapping categoriesExercise 8: Cleaning text dataExercise 9: Removing titles and taking namesExercise 10: Keeping it descriptiveWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.