Get startedGet started for free

Introduction

1. Introduction

Hi and welcome the first course in DataCamp's data visualization with ggplot2 series!

2. Your instructor - Rick Scavetta

My name is Rick Scavetta and I'll be the instructor for this series. I've been training scientists on how to better understand and visualize their data since 2012. I'm very excited to bring my experience to DataCamp. So what is data viz?

3. Data visualization & data science

Data visualization is an essential skill for data scientists. It combines statistics and design in meaningful and appropriate ways. On the one hand, data vis is a form of graphical data analysis, emphasizing accurate representation and interpretation of data. On the other hand, data vis relies on good design choices, not only to make our plots attractive, but to also aid both the understanding and communication of results. On top of that, there is an element of creativity, since at it's heart, data vis is a form of visual communication.

4. Exploratory versus explanatory

It's important to understand the distinction between exploratory and explanatory visualizations. Exploratory visualizations are easily-generated, data-heavy and intended for a small specialist audience, for example yourself and your colleagues - their primary purpose is graphical data analysis. Explanatory visualizations are labor-intensive, data-specific and intended for a broader audience, e.g. in publications or presentations - they are part of the communications process. As a data scientist, it's essential that you can quickly explore data, but you'll also be tasked with explaining your results to stake-holders. Good design begins with thinking about the audience - and sometimes that just means ourselves.

5. MASS::mammals

This data set contains the average brain and body weights of 62 land mammals. To understand the relationship here, the most obvious first step is to make a scatter plot, like this one.

6. A scatter plot

Two mammals, the African and Asian Elephants have both very large brain and body weights, leading to a positive skew on both axes.

7. Explore with a linear model

Here, applying a linear model is a poor choice since a few extreme values have a large influence.

8. Explore: fine-tuning

A log transformation of both variables allows for a better fit. So, although we began with a rough exploratory plot, that informed us about our data and lead us to a meaningful result.

9. Publication-ready plot

In the end, we'd probably want a cleaned-up explanatory plot.

10. Anscombe's plots

Here's a classic example from Francis Anscombe, first published in 1973. When we imagine a linear model, as presented on this anonymous plot, we imagine that we are describing data that looks

11. Anscombe's plots

something like this. But this same model could be describing a very different set of data

12. Anscombe's plots

such as a parabolic relationship.

13. Anscombe's plots

which calls for a different model.

14. Anscombe's plots

or data in which an extreme value has a large effect.

15. Anscombe's plots

which becomes clear when the outlier is removed. And sometimes

16. Anscombe's plots

the model may be describing a relationship where in fact there is none at all

17. Anscombe's plots

because some extreme values may be incorrect.

18. Anscombe's plots

If we relied solely on the numerical output without plotting our data, we'd have missed distinct and interesting underlying trends. We can see that data viz is rooted in statistics and graphical data analysis, but it's also a creative process that involves some amount of trial and error.

19. Let's practice!

Alright, enough examples, let's get our fingers moving with some exercises.