Get startedGet started for free

How long might you be stopped for a violation?

1. How long might you be stopped for a violation?

In this section, we'll start by learning how to translate strings into data that can be analyzed numerically, and then we'll learn a few easy ways to improve our plots.

2. Analyzing an object column

Let's return again to our DataFrame of Apple stock prices. A new column called change has been added to the DataFrame. It indicates whether the stock price went up or down compared to the previous trading day. Let's pretend we wanted to calculate how often the price went up. One way to do this would be to create a Boolean column that is True if the price went up, and False otherwise. Then we could easily calculate how often the price went up by taking the mean of the Boolean column. But how would we create this column? The change column has the object data type because it contains strings, and previously we've used the astype() method to convert strings to numbers or Booleans. However, astype() only works when pandas can infer how the conversion should be done, and that's not the case here. We'll need to find a different technique.

3. Mapping one set of values to another

When you need to map one set of values to another, you can use the Series map() method. You provide it with a dictionary that maps the values you currently have to the values that you want. In this case, we want to map "up" to True and "down" to False, so we'll create a dictionary called mapping that specifies this. Then, we'll use the map() method on the change column, pass it the mapping object, and store the result in a new column called is_up. When we print the DataFrame, you'll see that the is_up column contains True when the change column says up, and False when the change column says down. Now that we have a Boolean column, we can calculate how often the price went up by taking the mean() of that column. The answer is that it went up 50% of the time.

4. Calculating the search rate

Now we're going to return to our DataFrame of traffic stops, and shift to a completely separate topic. Let's say that we wanted to visualize how often searches were performed after each type of violation. We would group by violation, and then take the mean() of search_conducted. This calculates the search_rate for each of the six violation types, and returns a Series that is sorted in alphabetical order by violation. We'll save this as an object named search_rate.

5. Creating a bar plot

To visualize the search rate, we'll create a bar plot since we're comparing the search rate across categories. The violations are displayed on the x-axis, and the search rate is on the y-axis. This plot looks okay, but there are two simple changes we can make that will make this plot more effective.

6. Ordering the bars (1)

The first improvement we can make is to order the bars from left to right by size, which will make the plot easier to understand. All we need to do is to use the sort_values() method to sort the search_rate Series in ascending order.

7. Ordering the bars (2)

Then, when we call the plot method on the sorted data, the bars are now ordered. This makes it easy to see which violations have the highest and the lowest search rates.

8. Rotating the bars

The second improvement we can make is to change the kind argument from bar to barh, which will rotate the bars so that they're horizontal. This makes it much easier to read the labels for each bar.

9. Let's practice!

Let's go ahead and get started with the last few exercises in this chapter.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.