Get startedGet started for free

Plotting multivariate data

1. Plotting multivariate data

Visualizing multivariate data is essential to do before performing any multivariate analysis. In this video, we will discuss some widely used plotting techniques for multivariate data in R.

2. Various plotting options

We will start with the basic R plotting techniques, and then move on to the lattice library, which provides a formula interface for plotting multivariate data. We will also discuss specific plotting tools from GGPlot, and some 3D plotting techniques.

3. Basic R plot for multivariate data

To plot the pairwise scatter plot of all the numeric variables in the dataset, we can use the pairs() plot, which produces a matrix of plots, where the ith jth entry is the scatter plot of variables i and j. Note that if you have a lot of variables it might not be a good idea to use a pairs() plot, since the individual plots might be very small.

4. Pairs plot by color

It is often useful to color plots by specific variables. For the Iris dataset, we can color the scatterplots by species. In the pairs() command, we provide the extra argument col equals iris underscore raw dollar sign Species to color the points by species.

5. Lattice

Now we will draw a similar plot for all the four variables with the splom() function. Using the formula interface, the first argument specifies the variable you want to plot, the second argument, col, colors the point according to the species, and the third argument, pch, changes the symbol type. There is an entire DataCamp course on data visualization in R with lattice, which will help you learn more about the full potential of the lattice package.

6. Using ggplot

ggplot2 allows us to use a single line of code to automatically subset multivariate data, calculate statistics on the subsets, and provide plots specific to each subset, coloring them appropriately. We will first discuss the ggpairs() function, which is similar to the pairs() function, but the upper triangle, the lower triangle, and the diagonal provide different pieces of information. In addition to the scatterplots that appear in the lower triangle, the smoothed densities for each of the variables by group are shown on the diagonal, and the overall and species-specific correlations are displayed in the upper triangle. The simplest implementation of ggpairs() is to specify the dataset, data equals iris underscore raw, and specify the columns to plot, for example, specifying 1 colon 4 plots all four numeric variables. Additionally, one can color by species using the argument mapping equals aes with the argument color equals Species. A similar plot using the generic plot() function would have required several lines of code.

7. 3D plots

Now let us move to 3D plots. There are a host of 3D plotting functions, including some interactive ones, which allow the user to rotate plots to adjust the viewing angle. In this lesson, we will discuss the scatterplot3d() function from the library scatterplot3d. In this example, we plotted the Sepal Length, Petal Length, and Petal Width, and colored them by species. The first argument for the scatterplot3d() function is the columns of the data containing the variables of interest, and the second argument specifies what to color the points by, using the argument color equals as dot numeric(iris underscore raw dollar sign Species).

8. 3D plots

You can also specify other arguments when using the scatterplot3d() function. For example, here we change the plotting symbol to x's by specifying pch equals 4, and change the viewing angle from the default value of 40 to 80 by including the argument angle equals 80.

9. Let's practice some plotting with the wine data!

Now it's your turn to make use of some of these plotting techniques and explore the wine data.