Get startedGet started for free

Selecting columns

1. Selecting columns

Previously, we worked with full DataFrames. However, we sometimes want to work or look at only a few columns, as a more compact DataFrame is easier to understand. That's where selecting columns comes in.

2. US minimum wages

Let's have a look at the US minimum wages dataset. The full version contains ten columns, but we are interested in only some of them. So let's select only the relevant columns for us to work with!

3. How we slice

We already know how to select columns using slicing from Introduction to Julia.

4. Selecting columns

Let's now look at the select function from the DataFrames package. This function allows us to select columns from a DataFrame using multiple ways. Firstly, we can use the column numbers. In our example, we want the state, year, and state minimum wage, so we so we call select, passing our wages DataFrame, two, one, and four, to get the second, first, and fourth columns in that order.

5. Selecting columns

Or we can use the column names. We can use strings, symbols, or a mix of both. We can also mix column names and numbers.

6. Selecting using patterns

Sometimes, we want to select all columns with names that follow a specific pattern. In our US wages example, we might want to select all columns starting with the letter e, the word state, or similar. We could select them using their names or their position. However, if our dataset is huge, it could easily become bothersome. We can use one of Julia's several useful functions to make our life easier.

7. Selecting using patterns

If we want to select all columns starting with a certain letter or a phrase, we can do so by using the Cols and startswith functions. This combination allows us to select all columns that start with the argument of the startswith function. For example, to select all columns starting with the word state, we call select and pass the DataFrame name, cols startswith functions, and state as a string.

8. Selecting using patterns

The function endswith is useful to select all columns ending with the same string. Here, we select all columns with values in 2020 US dollars by calling select and passing wages, cols endswith functions, with the string 2020-underscore-dollars.

9. Selecting using patterns

If we are looking for a string anywhere in the names of the columns, we can call Cols, passing a call of contains-dot, where we filter for the string min.

10. Regex

Another useful way for dealing with strings is to use regular expressions or regex for short. Regex is a string of text that helps us match, search for, and manage patterns in text. It is useful when we are searching for more complicated patterns. We won't go into details on how to use regex as it is out of the scope of this course. If you want to learn more, you can have a look at DataCamp cheatsheet, regex101-dot-com, or other websites.

11. Using regex

In Julia, regular expressions are prefixed with r to differentiate them from normal strings. In this example, we select all columns that have min in their name by calling select and passing wages and r-double-quotation marks min.

12. select!() vs. select()

Lastly, the select function comes in two versions - select and select-bang. As usual in Julia, the select-bang modifies the argument in place, so it changes the original DataFrame. On the other hand, select returns a new DataFrame with the selected columns, keeping the original DataFrame intact. It's up to us to decide when we want to use either of them.

13. Let's practice!

Are you ready to select your columns? Let's practice in the exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.