Summarize group statistics
Sometimes you want to understand how a value varies across groups. For example, how does the maximum value per group vary across groups?
To find out, first summarize by group, and then compute summary statistics of the group results. One way to do this is to compute group values in a subquery, and then summarize the results of the subquery.
For this exercise, what is the standard deviation across tags in the maximum number of Stack Overflow questions per day? What about the mean, min, and max of the maximums as well?
This exercise is part of the course
Exploratory Data Analysis in SQL
Exercise instructions
- Start by writing a subquery to compute the
max()
ofquestion_count
per tag; alias the subquery result asmaxval
. - Then compute the standard deviation of
maxval
withstddev()
. - Compute the
min()
,max()
, andavg()
ofmaxval
too.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
-- Compute standard deviation of maximum values
SELECT ___(___),
-- min
___(___),
-- max
___(___),
-- avg
___(___)
-- Subquery to compute max of question_count by tag
FROM (SELECT ___(___) AS ___
FROM stackoverflow
-- Compute max by...
GROUP BY ___) AS max_results; -- alias for subquery