Get startedGet started for free

DataFrames

1. DataFrames

Often you will find yourself working with data that could naturally be represented in a table.

2. Tabular data

The run data we have been working with is an example of tabular data. Each row of the table represents a different observation - a single run. Each column represents a different variable, such as the day of the week or the run distance. So how do we process tabular data in Julia? We need a square data structure, perhaps like a two-dimensional array. However, each column has a different data type.

3. Tabular data

We have strings, integers, floats, and boolean values. So if we used an array, we would need it to have a data type of Any. These array types are slow, and Julia has a better solution.

4. DataFrames

The DataFrames package lets us store and manipulate tabular data in Julia. To create a DataFrame, we can use the DataFrame function from this package. We usually import the DataFrames package with the using keyword so that we can use the function without the package name at the start.

5. DataFrames

We create columns of data using keyword arguments. Here we make the column named 'day' and assign an array of values to appear in each row. We can repeat this for many columns. Notably, the arrays need to be the same length. In this case, each column has six values for the six rows.

6. DataFrames

We also need to make sure we have commas between each argument.

7. DataFrames

When we print the DataFrame, it should look something like this. The DataFrame stores each column with its own data type.

8. CSV files

We can also load tabular data straight from a file. A standard format for tabular data is a CSV file, which stands for comma-separated variable. We can open these files and look at their contents directly since they are just text data. The CSV file of the run data looks like this. The column names are in the top row, and the rows of data follow below. A comma separates each value.

9. Loading CSV files

We can load data from a CSV file using Julia's CSV package. We import the package and use the File function. We pass the path to the file we want to load as a string. Even though we import CSV with the using keyword, we cannot use the File function directly; we must use CSV-dot-File. This is due to the CSV package itself and how it has been written. Once we have loaded the file, we can pass it into the DataFrame function to convert it into a DataFrame.

10. Printing DataFrames

Once we load data from a file, we need ways to explore it. We can use the function named first to select the first n rows of a DataFrame. This allows us to inspect a few sample rows from the dataset. The first argument of the function is the DataFrame, and the second argument is the number of rows.

11. Basic properties of DataFrames

We can print the column names by using the names function. We pass in the DataFrame, and the function returns an array of the column names. We can print the size of the DataFrame using the size function. We pass in the DataFrame, and the function returns the number of rows and the number of columns. These functions can be handy when there are many rows and columns, and we can't just print the DataFrame itself.

12. Let's practice!

You will see some examples of large datasets in the exercises.