1. Improving readability with Chain.jl
Now that we've gotten into more complicated commands, we might have noticed that the code is becoming less readable.
2. Problems with complicated code
The code we see here has a funny indentation, lots of commas and open and closed parentheses on different lines, making it hard to read.
As we are nesting many functions together, it is also easy to make mistakes with the parenthesis.
A possible solution would be to save every intermediate step to a different variable. However, then we end up with many auxiliary variables that we only use once.
Or we might overwrite something we don't want to.
3. What is piping?
So what is piping? Piping, or "chaining", is a coding approach. It allows us to string together multiple commands in a way that is at the same time compact and readable. Piping also avoids saving intermediate results without having to nest functions within one another.
4. Using Chain.jl
Although Julia supports piping on its own, the Chain package provides a convenient macro syntax. A macro takes a previously defined set of inputs and translates them into outputs. We'll showcase this by calculating the average minimum wage by year.
To start the macro, we write at-chain followed by the name of our DataFrame variable, wages in our example, and the begin keyword. We then write every individual function call on a separate line, in the order we want them performed.
Here, we are first selecting some columns. We then group the narrower DataFrame by the year column and calculate the average minimum wage for each year. We keep these commands on separate lines.
We can also indent these lines to make everything easier to read. Because every line is automatically used as the first argument in the next line, we don't use the DataFrame name as the first argument. We end the macro using the end keyword.
5. Piping with _
How did the minimum wage evolve over time? To answer this question, we visualize our results.
Now, we need to use the result in two places, as x and y coordinates, in the last line. To do so, we replace the variable name with an underscore like here with underscore-dot-year instead of wages-dot-year.
6. Skipping piping with @aside
Let's now compare the state minimum and actual minimum wages over time. To do that, we'll need two plotting commands. But we don't want to feed the result of the first one to the second one as that would give us the wrong result. Luckily, there is the at-aside command!
If you start a line with at-aside, the line will get executed but the result of that line won't be fed to the next line.
In this example, we first plot the mean of the minimal wage over time with at-aside. Then we add plot-bash command for the state wage. That way, we get the correct plot!
Using at-aside is useful not just for plotting, but especially when we need to debug our code! You can use it with print statements as well!
7. Saving the result
Sometimes we want to use piping just for exploring the data. Other times, we want to keep the result and use it elsewhere. No need to worry though - we can assign the result of the chain macro to a variable as usual. This can save us a lot of hassle by building the final result step by step, inspecting the result every time, and only saving it at the end.
8. Let's practice!
Are you ready to simplify your code using piping? Let's practice in the exercises!