1. Replace scalar values using .replace()
Welcome back! In this second chapter, we will focus on the .replace() function, and see how it can be used to replace a scalar value in a pandas DataFrame.
2. The popular name dataset
In this chapter, we will work with a dataset that includes the most popular names that were given to newborns between 2011 and 2016.
Our dataset includes, among other information, the most popular names in the US by year, gender and ethnicity.
For example, the name Chloe was ranked second in popularity among all female newborns of Asian and Pacific Islander ethnicity in 2011.
3. Replace values in pandas
In pandas, we can replace values in a very intuitive way. We can simply define which values we want to replace, and then what we want to replace them with. We can use any method to select our entries of interest.
In the following example, we will replace all the babies that are classified as male to boys. First, we select all the entries from the Gender feature that correspond to male and we simply replace with them with the word boy. But is it the fastest way to perform this action?
4. Replace values using .replace()
As for most operations we are covering, pandas has an optimized built-in functions: .replace().
When we want to replace a scalar value with another scalar value, the syntax of this function is simple. We denote the value we want to replace, and then the value we want to replace it with.
As before, we time the performance of the function when replacing all the entries classified as male with boy.
As we can see when comparing the speed between the two functions, pandas' .replace() function performs ~1,700 percent faster!
5. Let's do it
We just saw the difference between the intuitive way and the pandas way of replacing scalar values. Now it's your turn to give it a try!