Get startedGet started for free

Database functions for large datasets

1. Database functions for large datasets

Welcome back to the final chapter of this course! It's time to discover database functions and equip ourselves to work effectively with large datasets. Let's go!

2. Aggregating large data sets with conditions

Imagine you are presented with a large dataset, for example point of sale data, where every row represents a single item in customer's basket. Looking at the table, we could distinguish rows related to sales of various fruits or vegetables. This dataset can be summarised on multiple levels, such as per basket, per customer, per shop, and using all kinds of conditions, for example, only counting the baskets which contain at least 2 kg of carrots and so on. This is when the well known conditional aggregation functions such as SUMIF(s), AVERAGEIF(s), or COUNTIF(s) come in handy! You might remember them from other Excel courses on DataCamp.

3. Limitations of conditional aggregate functions

While SUMIFS(), COUNTIFS() are very handy and allow for multiple AND conditons, they can be cumbersome to write when our conditions are more complex. On the left, we count Florida orders which were returned. With two AND conditions, on Returns and on State columns, we use one single COUNTIFS() statement to achieve that. However, if we wanted to add additional OR condition, for example also counting orders for Joe's shop, we need to add another COUNTIF condition to this equation. This is hard to write and read.

4. Database functions for the win!

It's time to meet database functions that make such calculations easier! This family of functions, consisting of DSUM, DCOUNT, DAVERAGE and so on, has one large advantage: its syntax is the same for all of them and has just three arguments: Database - so the entire data table with headers, Field - so the field we want to aggregate, so sum, count, average etc. and finally, criteria, so our AND OR conditions. Pro tip: it works with mathematical operators and even wild cards!

5. Multiple AND criteria

How does it work? Suppose we want to SUM Sales for all orders in Florida which were returned. We need a database, so the blue table on the left and, criteria, so the red table on the right. Notice that the column names must be the same in both. Our DSUM function starts with the reference to the database, so cells A1 to F6, then we need to refer to the column we want to sum, so Sales and finally to the array with our criteria, so H1 to J2. Another Pro tip! You can replace the hard coded reference to "Sales" by inputing either a column number, here six, or the reference to the cell, so F1.

6. AND / OR criteria!

Now what if we wanted to add an OR criteria to it, by summing not only returned sales of Florida but also adding an OR condition on sales in Joe's shop? To do that, we add another row in our Criteria table. This automatically adds an OR condition. What we still need to adapt in our function is to extend its criteria array, from H1:J2 to H1:J3 and that's it! And something very important: blank rows in the criteria table equal no condition, so by including a blank row, you will sum up the entire table in your OR statement!

7. Eager to practice?

Now, let's practice it!