Aan de slagGa gratis aan de slag

Summarize group statistics

Sometimes you want to understand how a value varies across groups. For example, how does the maximum value per group vary across groups?

To find out, first summarize by group, and then compute summary statistics of the group results. One way to do this is to compute group values in a subquery, and then summarize the results of the subquery.

For this exercise, what is the standard deviation across tags in the maximum number of Stack Overflow questions per day? What about the mean, min, and max of the maximums as well?

Deze oefening maakt deel uit van de cursus

Exploratory Data Analysis in SQL

Cursus bekijken

Oefeninstructies

  • Start by writing a subquery to compute the max() of question_count per tag; alias the subquery result as maxval.
  • Then compute the standard deviation of maxval with stddev().
  • Compute the min(), max(), and avg() of maxval too.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

-- Compute standard deviation of maximum values
SELECT ___(___),
	   -- min
       ___(___),
       -- max
       ___(___),
       -- avg
       ___(___)
  -- Subquery to compute max of question_count by tag
  FROM (SELECT ___(___) AS ___
          FROM stackoverflow
         -- Compute max by...
         GROUP BY ___) AS max_results; -- alias for subquery
Code bewerken en uitvoeren