Get startedGet started for free

Querying the database

1. Querying the database

Okay now for the grand finale, we're going to answer the kind of questions you will get asked as a data scientist. We'll be using each of the techniques shown in this video to complete the case study.

2. Part 3: answering data science questions with queries

Here is an example of how we calculated an average in an exercise from Chapter 3. We began by importing the select statement. Next we built a select statement that creates a weighted average. We do this by summing the result of multiplying the age with the population and dividing that by the sum of the total population and labeling that average age. Next we grouped by the sex column to determine the average age for each sex. Finally, we executed the query and fetched all the results.

3. Part 3: answering data science questions with queries

We learned how to calculate a percentage by using the case and cast clauses in Chapter 3. We begin by importing case, cast, and Float. Then we build a select statement that calculates the sum of the pop2008 column in cases where the state is New York. Then we divided that by the sum of the total population which is cast to a Float so we would get Decimal values. Finally, we multiplied by 100 to get a percentage and labeled it ny_percent.

4. Part 3: answering data science questions with queries

Also from Chapter 3, we learned how calculate the difference between two columns grouped by another column. We start by building a select statement, that selects the column we want to determine the change by, which in this case is age. Then we calculate the difference between the population in 2008 and in 2000, and we label that pop_change. Remember to wrap the difference calculation in parentheses so you can label it. Next, we order by pop_change and finally we limit it to just 5 results.

5. Let's practice!

Now it's your turn to use these techniques in the case study. Good luck!