Summarizing data
1. Summarizing data
Well done so far! We've already used our first aggregate function, COUNT().2. Aggregate functions
Now, let's discover four more functions, sum, average, minimum, and maximum, to answer questions like, "What is the average number of votes per film?".3. Summarizing fields
Recall that aggregate functions summarize multiple rows into a single value, helping us understand data as a whole or by group. Let's begin with the total number of rating votes the films received using "Show me the total number of votes across films". The result is over 419 million!4. Calculating field averages
Average works in a similar way. Our results show that a film gets an average of around 84 thousand ratings. This result, however, would look messy in a report.5. Rounding results
We can add a rounding request to have cleaner output, rounded to a whole number. The ROUND() function lets us round to 1, 2, or more decimal places, or use negative values to round to the nearest hundred or thousand.6. Rounding results
Adding instructions such as "rounded to one decimal place" or "to the nearest hundred" helps the AI format results appropriately, but it's always best to verify the results.7. Grouped averages
We mentioned earlier that summary statistics are commonly used with grouped data. Let's explore the average budget per language, sorted by highest budget. Our data shows budgets in local currencies. Since we're summarizing groups, the average reflects all films within each language group. SQL does this by grouping first, then applying the aggregate. We have three language groups without budget information, and Korean films have an average budget of 2.7 million won.8. Minimum and maximum
Our next aggregate functions are MIN() and MAX(), which identify extreme values in our data. The prompt "Show each country's lowest and highest budgets, ordered alphabetically" returns those values per country. It looks like Argentina's budgets range from 800,000 to 2 million pesos. Using both functions in one query highlights data ranges, helping us understand the scope of the data and what counts as a typical value.9. Aggregate functions and data types
We've tested these aggregate functions on numerical fields like votes and budget. AVG() and SUM() only apply to numbers, since adding or averaging all film titles together doesn't make sense! MIN(), MAX(), and COUNT() work across various data types. We've already experienced this with COUNT(). Minimum and maximum will provide the lowest or highest values of most fields. Lowest to highest can mean from A to Z or from the earliest to the most recent date.10. Minimum and maximum
Here's an example of finding the minimum and maximum language. Our film catalog is pretty diverse!11. SQL best practices: aliases
Did you notice that, although we didn't specify an alias, the AI usually included one for us? It's best practice to use aliases with aggregate functions to keep results readable. For example, if a query returns the average number of users and critics, we wouldn't know which is which without aliases. AI won't always generate the same SQL, so it's important to verify and adjust if needed.12. SQL best practices: formatting
Most AI-generated SQL follows good formatting standards, using multiple lines and indentation. Notice how these make the code easier to read.13. Ready to analyze?
Ready to analyze some data?Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.