Diving into DataFrames
1. Diving into DataFrames
Welcome to this course on manipulating data in Julia! I'm Kat, and I'll be your guide through this course.2. Course outline
Data manipulation is the basis of more advanced things like data analysis and data science. And as Julia is both quick and readable, it is the perfect tool! So let's dive in! In this course, we'll learn how to select or drop columns and how to create new ones. We'll find out how and why to group data and how to calculate summary statistics and pivot tables. We'll discuss loading and saving CSV files, joining DataFrames, and handling missing data. We'll visualize our results to gain more insights while keeping our code readable and organized.3. Datasets
Throughout the course, we'll explore data about penguin species, analyze trends in US minimum wage, and choose the best chocolate bar for our new cafe. We'll also explore delays at US Airports.4. Strings and symbols
So far, we have used strings to refer to individual columns in a DataFrame. However, there is another way - using Symbols. Symbols are basically labels in Julia. An example of a label would be col-one. To use a symbol, we can use the Symbol-parenthesis-name of the column as a string. Or, if the name of the column doesn't contain any space, we can use a colon and the name of the column without any space between. We can use both strings and symbols to refer to a column. There is not much of a speed difference, but symbols can be easier to write.5. What is missing
Julia provides two easy ways to check if a column contains missing values. We can print a few rows of a DataFrame and look at the data types in the individual columns using the first function. If there is a question mark after the type, the column contains missing values, or there are other types mixed in. If we want to know more about missing values, we can look at the describe function. It provides us with the number of missing values for each column as nmissing.6. Describe it better
In Introduction to Julia, we learned about the describe function that summarizes interesting features of our DataFrame. Now it's time to build upon that knowledge! We can specify what function we want to include by providing the function names after the DataFrame name as symbols. Here, we only want the number of missing values and the data types in the columns for the penguins DataFrame, so we call describe and pass penguins, colon-nmissing, and colon-eltype. As we want to know the type of the elements in the columns, we need to use eltype instead of typeof.7. Describe it how we like it
We can use functions not included in the describe function by default. Maybe we want to include the sum of each numerical columns and name it total.To do that, we call describe, passing the name of the DataFrame, then specify the sum function followed by equals-greater-than sign and colon-total.8. DataFrames syntax
The last bit is a special syntax of the DataFrames package. It goes like this: We write the name of the column we want to work with, followed by equals-greater-than signs, with no space between them. We then write the transformation for these columns. Julia will automatically name the resulting columns, but we can change that by writing equal-greater-than and the new name as a string or a symbol.9. DataFrames syntax
If we want to include more than one element at any stage, we amend the code accordingly.10. Let's practice!
That was a lot of information. Let's practice in the exercises!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.