1. Counting, summing, and grouping data
Often, we want to be able to count the data we are working with or calculate the sum of a column in our data.
2. SQL functions
Things like count and sum are SQL aggregation functions, and they can be found in the func sqlalchemy module. It's far more efficient to let SQL perform these functions than to get all the results and loop over them in Python. We call them aggregation functions because they collapse multiple records into one.
3. Sum example
For example what if I wanted to get a sum of the census pop2008 column for all the records? After I had created an engine, metadata, and reflected the table, I would import the func module. Then in my select statement where I would normally just put the column, I put the column wrapped by func-dot-sum. Then I use the scalar fetch method to get back just a value and print it. It's important not to import the sum function directly, because it will conflict with Python's built-in sum function.
4. Group by
Often when using a function, we want to do so by some grouping of another column. For example, we may want to get a sum of the population aggregated by sex in the census database. To do so, we can use a group_by() clause to specify the data we want to aggregate by.
5. Group by
To get a sum of the population by gender in the census database, we would select the sex column and the func.sum of the pop2008 column. Then we would append a group by clause that targets the sex column. Then when we execute the query and get the results, we will have the data summed by the sex value of each record.
6. Group by
Much like order_by, the group_by clause can accept multiple columns and will group with in the groups from left to right. Every column in the select statement must in the group_by clause or wrapped in a function such as sum or count.
7. Group by multiple
We could go a step further and group by both sex and age to get the number of people for each sex by age in the 2008 population. We'd just add the age column to both the select statement and the group_by clause as shown here.
8. Handling ResultSets from functions
When we use a function such as sum or count, the column name that represents the function in our results is set to a placeholder. For example, if we had a count function in our select statement it would appear as the count_1 column in the result set. This can make it difficult to handle the result set; however, we can use the label() method on a function to give the output column a specific name.
9. Using label()
So if we wanted to calculate the population by sex, you can see that it returns a sum_1 column in the result set. if we wanted to call that pop2008sum in the result set, we would use the label method on our func-dot-sum clause. Then append the usual group by clause onto the statement, and after executing the query, I would be able to use that name when accessing the components of each result in the result set.
10. Let's practice!
Let's put this to use.