Get startedGet started for free

Manipulating columns

1. Manipulating columns

Now it's time to do some number crunching!

2. Applying functions

An important part of data manipulation is the application of functions to columns and rows of our DataFrame. We'll learn the first part, how to apply functions working on the whole columns, in this video. So why do we want to apply a function to columns as opposed to line by-line? We might be interested in features determined by the whole column, such as mean, median, or minimum, etc. We need to be mindful that in Julia, functions like maximum are called by their full name and not just as max. Even more powerful is a combination of grouping our data and then applying these functions, something we will discuss in the next chapter.

3. Options

So how can we apply functions to column? DataFrames-dot-jl offer us three different approaches - the select, transform, and combine functions. All three of these functions also have variants with the bang, in case we want to mutate our DataFrame in place.

4. select()

The first is the select function, which we know from previous videos. What we still need to mention is that we can manipulate our columns while we are selecting them. An example here is using the select function to change the order of columns while simultaneously renaming one of them. We are using the equals-greater-than signs between the old and the new names of the column.

5. select()

We can also use other functions such as mean, minimum, or our own functions. The result, one number, is then broadcasted over all of the rows of the column. Overall, the select function might decrease the number of columns while keeping the number of rows the same.

6. transform()

The second option is the transform function. It is similar to the select function, however, it keeps all the original columns as well as the new ones. It also leaves the number of rows intact. Here we are using transform to calculate the maximum of the body mass column.

7. combine()

Lastly, we have the combine function. Whereas the select and transform functions kept the overall layout of the DataFrame the same, the combine function works differently. While it only keeps the newly constructed columns like the select function, it does not broadcast the results over all the rows. Instead, the result of combine is just a DataFrame with one row and the new columns. This can be useful if we want to look at the values on their own, for example to compare them.

8. How to handle multiples

Sometimes, we want to apply multiple functions to a column, or we want to pass multiple columns to a function. To do that, we pass a vector of functions or vector of columns instead of just one function or a column. We also need to add a dot before the equals-greater-than signs. The dot ensures that everything gets broadcasted as it should. This approach can be used with any of the select, transform, and combine functions. Here, we are using the combine function to calculate the mean, the minimum, and the maximum of the body mass column. We also calculate the mean of the body mass and the flipper length columns.

9. Cheat sheet

These three functions can be a bit confusing at first. So here is a little cheat sheet to help you choose the best function to suit every application.

10. Let's practice!

Are you ready to combine your skills? Let's transform some exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.