Get startedGet started for free

Efficient workflow

1. Efficient workflow

We are almost there! In this video, we'll discuss techniques and strategies to make our workflow easier, more readable, and, at least partially, reusable. We'll also discuss how to combine all that we've learned in this course together effectively.

2. Tips for names

The first tip we should already be doing is giving variables short but meaningful names. This applies to both DataFrames and columns. It is much easier to remember what the wages dataset contains rather than what df contains. While we talk about names, it is also useful to follow a certain naming convention or structure. If one column would be named state-underscore-wage-underscore-2020 and the other effective-dot-2020-dot-dollars, it is going to be more challenging to remember column names. The same applies to capital letters in column names.

3. Too many variables

Although it is useful to save changes to new variables, we should not overuse it. It unnecessarily clutters the memory, makes the variable names longer, and creates chaos in the workspace. We can either overwrite some by using mutating functions, or use the chain macro to reduce the number of DataFrame versions.

4. Variables instead of hard coding

On the other hand, it is better to use variables instead of hard coding values, for example when replacing missing data. That way, if we want to change the value, we only have to do it in one place.

5. Make a function of it

Similarly, if we keep doing something over and over, we should write a function for it. That way, we are less likely to make a mistake. And once you write a function, it is usually quicker to use.

6. Comment and document

It's a good practice to include comments in our code so that we, or others, can better understand what is happening. It is also very useful to note down why we are doing things - will we remember why we replaced some missing values with the mean and some with the minimum? Maybe but most likely not. Writing down what we are doing and why we are doing it helps both us and anyone who needs to understand our work.

7. Get to know the data

We should not worry about taking time and getting to know the data. Understanding the data is the first step in extracting information from it! Make plots, print the results, dig in.

8. Ask for help!

Make Google, Stack Overflow, DataCamp cheat sheets, and other resources our friends! There is no need to know everything from the top of our head.

9. Have fun!

There are many more tips out there, but we'll finish with the most important one - have fun, don't give up, and enjoy the process!

10. Flight delays in US airports

Let's apply these tips in the exercises. We'll be working with data about flight delays in US airports. There are three datasets there. The first one, airlines, contains the airline identifier and its name. The second dataset, airports, has information dealing with airports and their location. The last dataset, flights, contains flight details such as flight number or delay for the three largest US airlines for March 2015. It will be your job to analyze those datasets in the exercises. Is there an airport that one should avoid?

11. Let's practice!

Are you ready? Let's take off!