Get startedGet started for free

Calculating quantiles and histograms

1. Calculating quantiles and histograms

In this lesson, we'll summarize restaurant prices using quantiles, bins, and a histogram.

2. Quantiles and histograms

We want to categorize each restaurant in our dataset as budget, mid, or premium.

3. Quantile-based categorization

One way to categorize the restaurants is to order them by price.

4. Quantile-based categorization

Then budget is the cheapest one-third,

5. Quantile-based categorization

Premium is the most expensive third, and mid is everything in between. We calculate quantiles to do this.

6. Reviews dataset

Now, in our reviews dataset, we'll use the 33rd and 67th quantiles to set three price tiers.

7. Compute quantiles

We start with the select method,

8. Compute quantiles

Create an expression on the price column.

9. Compute quantiles

And call quantile(0.33) to get the 33rd percentile.

10. Compute quantiles

And we call this budget_cutoff.

11. Compute quantiles

Then we add a quantile(0.67) expression to get the upper cutoff for the mid category. This gives us a one-row DataFrame with the cutoff prices.

12. Using quantiles to create categories

To create a new column of categories based on quantiles, we use a when-then statement. If the price is below the 33rd percentile, we label it as budget.

13. Using quantiles to create categories

Price below the 67th percentile is mid-category.

14. Using quantiles to create categories

And otherwise it's premium. We call this column price_band.

15. Quantile versus Fixed breaks

By using quantiles, the data defines the categories. This is handy if prices change over time. It also ensures that we have balanced group sizes. However, if the number of cheap places in our dataset grows quickly, some of them will be categorized as mid. An alternative approach is to set fixed thresholds. For example, we can set the thresholds at 10 and 20 pounds. Group sizes are less predictable, but the composition of groups is easier to understand.

16. Categories with fixed thresholds

Now let's add a new column based on fixed thresholds.

17. Categories with fixed thresholds

We start with the price column.

18. Categories with fixed thresholds

Then use the cut expression.

19. Categories with fixed thresholds

With breakpoints at 10 and 20 pounds. With two breakpoints, we get three categories.

20. Categories with fixed thresholds

So we need three labels.

21. Categories with fixed thresholds

We name the new column price_band, and we get the categories in the output. Quantiles and breakpoints summarize key values, but they don't show the full distribution of prices.

22. Build a histogram

To create a histogram, we first create a Series from the price column.

23. Build a histogram

And call the hist method to bin the data. By default, Polars divides the data into 10 bins. For each bin, the breakpoint column has the upper bound, the category has the range of values, and the count is the number of rows in each bin.

24. Build a histogram with defined bin count

We can instead specify that we want five bins.

25. Build a histogram with defined breakpoints

Or specify the bin breakpoints. With bins at 0, 10, 20, and 100, the first bin runs from 0 to 10, the second from 10 to 20, and the third from 20 to 100.

26. Plot the histogram

We can make a plot of our histogram using Plotly, a library for interactive plotting. We'll focus on creating a bar chart with Plotly, rather than covering all of its features. We first import plotly express and create the histogram DataFrame with bin edges at 10, 20, and 100 pounds.

27. Plot the histogram

We make a bar chart with px.bar, passing the histogram DataFrame as the first argument, with the category column on the x-axis and count on the y-axis. Calling fig.show() displays the histogram.

28. Plot the histogram

This shows that most restaurants fall into the middle of our price range, followed by more expensive options.

29. Let's practice!

Now it's your turn to compute quantiles, build tiers, and create a histogram.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.