1. Histograms
Now that you have an idea of unit economics, let’s explore distributions. Understanding your dataset's distribution helps you understand which values are frequent and which are rare. The best way to visualize and understand a dataset’s distribution is to plot its histogram.
2. Histograms - overview
A histogram visualizes the frequencies of each value in a dataset. The bar graph on the right is the histogram of orders per user. 7 users have ordered once, 42 users have ordered twice, and so on.
To plot a histogram, you'll need the frequency table. The frequency table on the left is the data source of the plot on the right.
Let's write a query to get the frequency table .
3. Histograms - query (I)
To get the frequency table, you'll need each user's count of orders. Count each user's distinct order IDs and store them in the user_orders CTE.
Then, in the final query, select the orders column and count the distinct user IDs for each order count. Remember to sort the results by the orders column in ascending order!
4. Histograms - query (II)
You can also get the frequency table of the revenues Delivr has generated from each user by much the same method. In the CTE, calculate and store revenue instead of the count of orders for each user. This is the same CTE as that used in the second way to get ARPU in the previous video.
In the final query, round the revenue column, group by it, and count the user IDs for each revenue value.
Notice that revenue is rounded to a negative place. When a negative value is passed to the ROUND function as the second argument, it rounds to the nearest ten to the power of the absolute value of what was passed. For example, passing -2 to the ROUND function rounds to the nearest hundred.
Revenues are usually decimal values, so it’s very unlikely that two or more users generated the exact same revenue. This clutters the histogram. That's why revenue is rounded here.
5. Plotting histograms
These two queries return frequency tables of orders and revenues respectively. The next step is to plot the histogram. You can use a variety of spreadsheet software to do that, including Microsoft Excel and Google Sheets. You can also plot histograms using either Python and the matplotlib library or R and the ggplot2 library.
6. What do histograms tell you?
What does a dataset's distribution tell you? Take the histogram of revenue. If the histogram is U-shaped, then there are many users who generate low revenue, and many who generate high revenue, but few who generate a median level of revenue. On the other hand, if the histogram is more normally shaped, then the opposite is true.
7. Histograms
Histograms are valuable visualizations to understand how a dataset’s values are distributed. In the following exercises, you’ll write queries to get frequency tables, from which histograms are plotted.