Get startedGet started for free

Exploring Data with Visualizations

1. Exploring Data with Visualizations

Making visualizations is a great way to understand our data better.

2. Why we visualize?

Let's look at an example. Here we have a DataFrame with the mean minimum wage in the US over the years. While we can try to understand the trends just from the numbers, an easier way is to make a plot out of it! With a plot, we can better understand what is happening than just with numbers. In this course, we'll cover the basics of plotting DataFrames using the Plots-dot-jl package.

3. Histogram

Sometimes, it is useful to visualize the distribution of a numerical variable. That's where histograms come in! A histogram is created by grouping your data into class intervals or bins. We then plot bars with height corresponding to the number of data points in each bin. We can create a histogram of the column col by calling histogram and passing df-dot-col. Julia automatically selects the optimal number of bins for us. If we want to specify it ourselves, we have several options. Most of them are out of scope of this course, but you can give the histogram function an additional argument bins and set it to some integer. Julia will then select the optimal number of bins close to that integer. Here we have an example of the distribution of inflation-adjusted minimum wages in US in the year 2015, with and without specifying the number of bins.

4. Labeling our plot

To fully harness the power of visualizations, we must include axis labels to make sure everyone, us included, understands what is on the graph. We want to add an x-label to our histogram. As we want to modify the figure, we call xlabel-bang and provide the string we want to label our axis with, in this case, inflation-adjusted minimum wage per hour (USD). To label the y axis, we call ylabel-bang and pass the string, number of states in our example. And to make the title of the graph, we call title-bang with Distribution of inflation-adjusted minimum wage in 2015.

5. Scatter plot

The second important type of plot is the scatter plot. It is useful for visualizing the relationship between two numerical variables. Let's look at the penguins dataset now! To plot the penguins flipper length against their body mass, we call scatter-and pass penguins-dot-body-mass-g and penguins-dot-flipper-length-mm. We can see that heavier penguins tend to have longer flippers!

6. Line plot

The next type of plot is the line plot. It is useful to visualize the change in a numeric variable over time. Here we have a dataset of the number of Adelie penguin observations over a period of time. To make a line plot, we can call plot and pass a column for x axis and a column for y axis. Here is an example of the number of observed Adelie penguins over time.

7. Multiple lines

But we know we have three different species of penguins. Can we add the other species to the plot? The answer is yes! We do that by adding a bang to the plot function as we want to modify the existing figure instead of creating a new one. This works for all the plotting functions we have covered so far.

8. Multiple lines with legend

But how does one differentiate between the different lines? By using the keyword argument label in the plots function. By adding labels to your plots, the legend automatically updates, making the plot easier to understand.

9. Cheat sheet

That was a lot of information, so here is a cheat sheet for you!

10. Let's practice!

Ready for some visual fun? Let's head over to the exercises!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.