Exercise

# Natural hit rate

In this exercise, you'll again use credit card transaction data. The features and labels are similar to the data in the previous chapter, and the **data is heavily imbalanced**. We've given you features `X`

and labels `y`

to work with already, which are both numpy arrays.

First you need to explore how prevalent fraud is in the dataset, to understand what the **"natural accuracy"** is, if we were to predict everything as non-fraud. It's is important to understand which level of "accuracy" you need to "beat" in order to get a **better prediction than by doing nothing**. In the following exercises, you'll create our first random forest classifier for fraud detection. That will serve as the **"baseline"** model that you're going to try to improve in the upcoming exercises.

Instructions

**100 XP**

- Count the total number of observations by taking the length of your labels
`y`

. - Count the non-fraud cases in our data by using list comprehension on
`y`

; remember`y`

is a NumPy array so`.value_counts()`

cannot be used in this case. - Calculate the natural accuracy by dividing the non-fraud cases over the total observations.
- Print the percentage.