Get startedGet started for free

Lookups & matching

1. Lookups & matching

Now we're going to revisit and expand upon some of the data sorting and matching ideas from the Data Analysis with Spreadsheets course.

2. Left joins with VLOOKUP()

Throughout the DataCamp curriculum you'll repeatedly come across the idea of joining or merging two rectangular datasets together. In R this is often done with dplyr or data.table. In Python you often use Pandas. Spreadsheets have the ability to perform what's known as a left join using the VLOOKUP() function. This is a little complex, and the function takes four arguments. Each of the datasets has to have an I-dot-D-dot. column with some common values. The first argument to VLOOKUP() is the value that you want to match. Let's try matching Ant. That's in cell A7 in the table we're merging into, so that's the first argument. The second argument is the range of the dataset that we want to find values in, given as absolute addresses. In this case that dataset stretches from A2 to B4. The third argument is the number of the column that contains the values to be merged in. In this case, we want the number of species from column B, the second column. The fourth argument is whether or not the lookup column is sorted. Usually this will be FALSE.

3. Programmatically sorting data

You can sort data by pointing and clicking in the menu, but if you want your analyses to be reproducible, you need to perform the sorting using code. The SORT() function takes 3 arguments. First is the dataset, given as absolute addresses. Second is the number of the column to sort by. Finally, you pass TRUE to sort in ascending order and FALSE to sort in descending order. In the example you can see the insect data sorted in descending order of the number of species, in column two.

4. Matching values

MATCH() lets you find values in a sorted dataset. It takes three arguments. First is the value to find. In the example, you are looking for one hundred thousand. Second is the absolute address of the column of data, here from B2 to B5. Finally, you pass one if the data is sorted in ascending order or minus one if the data is sorted in descending order. In this case, one hundred thousand would occur after the 2nd value, so MATCH() returns two.

5. Summary

You saw three functions in this video. VLOOKUP() is the spreadsheet equivalent of a database left join. The SORT() function does exactly what you might expect: it sorts datasets from smallest to largest or largest to smallest. Finally, MATCH() finds positions in sorted data where a value would occur.

6. Let's practice!

Time to find your perfect match.