Get startedGet started for free

Sorting and slicing data

1. Sorting and slicing data

Now we can load DataFrames, but they are often quite big, and we rarely want to use the whole table. To focus our analysis and answer questions, we need to be able to chop and slice them.

2. Selecting an element from the DataFrame

Let's say we have loaded the run data and need to extract a single element from the table. We want the time value from the 6th row of data. We can use df_run followed by square brackets. Inside the square brackets, we put the row number and column number of the element we want to extract. This is similar to indexing into arrays, except that the DataFrame has two dimensions, so there are two indices. As with arrays, we can slice out segments of the data. If we wanted the last two values in the time column, we could replace the row number with five-colon-six. This returns an array of the two values.

3. Selecting an element from the DataFrame

We can even use the end keyword as an alternative way to slice these last two values.

4. Selecting a column

In our analyses, it is common to slice out a whole column. Perhaps we want to extract all the run distances to find their total. We can slice the entire column by writing a colon in the first position and the column number after. This also returns an array of values.

5. Selecting a column

Instead of using the column number, we can also index a column using its name as a string in the second position. We can also remove the square brackets and write df_run followed by a dot and the column name. All three of these methods return the same array of values.

6. Selecting an element from the DataFrame

These multiple ways of slicing a column give us multiple ways to select a single value. We can use the row number and column number as before; we can use the row number and column name, or we can use a dot and the column name followed by the row number in square brackets. These all return us the same value.

7. Slicing multiple columns

If we want to select multiple columns, we can use a range like this. Here we slice the first to third columns. This returns a DataFrame of these columns.

8. Selecting rows

We can slice out rows as well as columns. We set the row index to a number and the column index to a colon. In this example, we select the fourth row of the DataFrame and all the columns. Julia won't return this as an array as the values in the row have different data types. Instead, it returns a DataFrame-Row.

9. Selecting multiple rows

To select multiple rows, we replace the row index with a range. Here we choose the second to the fourth row.

10. Sorting DataFrames

As well as slicing, we often want to sort DataFrames according to the values of one of the columns. For this, we can use our old friend, the sort function. Previously we used this function to sort one-dimensional arrays. Thanks to multiple dispatch, we can use it on DataFrames too. We pass in the DataFrame and the column name we want to sort by. This will return a copy of the DataFrame sorted by this column in ascending order from lowest to highest. We can reverse the sorting order by using the rev keyword parameter and setting it equal to true. This sorts the DataFrame in descending order.

11. Cheat sheet

We've covered a lot on indexing, slicing, and sorting DataFrames. Here is a cheat sheet you can refer back to in the exercises.

12. Let's practice!

Let's go on to the exercise now.