1. Other visualization tools
In this video, we will introduce some standard visualization tools using a dataset called amazon_stocks containing Amazon stock return from Jan. 2015 to Jan. 2017.
2. Histograms
In the introduction of this chapter, we mentioned the distribution of a time series. This is usually represented by an histogram. In lay man term an histogram is a diagram consisting of rectangles whose areas are proportional to the frequency of a variable and whose width is equal to the class interval. Let's go back to our Amazon example and look at the distribution of returns. In the piece of code presented breaks equals 20 sets the number of categories which is the class interval and xlab equals "" removes the default label on the Y axis. You will also notice in the main argument the \n which is used to display the title of the chart over 2 lines.
There are a lot of returns around 0 in this chart but you also notice some very positive and very negative returns. These are days of extreme performance. But histograms are not the only tool to analyze a distribution.
3. Box and whisker
Another popular method is the Box & Whisker plot. This type of plot is used to show the shape of the distribution, its central value, and its variability. In a Box & Whisker plot: the ends of the box are the
4. Box and whisker
upper and lower quartiles, so the box spans the interquartile range. Usually the median is marked by a vertical line
5. Box and whisker
inside the box and outliers are plotted as individual points. Here horizontal equals TRUE displays the plot horizontally. Once you know more about the distribution main characteristics,
6. Autocorrelation
you can turn to another important aspect: Autocorrelation. Autocorrelation is the correlation between the elements of a series and others from the same series separated by a given interval. The chart representing autocorrelations is called an autocorrelogram. It's usually represented by a bar chart, each bar representing a given interval or lag and the blue horizontal lines represent positive and negative significance thresholds. Autocorrelation is important because it tells you the relationship between observations in your time series. A positive number at position 8 as shown on the chart, tells you that when an observation is positive 8 periods ago then the next observation will also tend to be positive.
7. QQ-plot
A final element to assess, is whether your series follows a normal distribution. This is important because the underlying assumption of almost all standard statistical analysis is based on the concept of normality. If your series does not follow a normal law then none of the standard statistical tests can be applied or at best they will be biased.
The q-q plot or quantile-quantile plot is a quick and efficient way to check for normality. A q-q plot is a plot of the quantiles of a first data set against the quantiles of a second data set. Quantile means the fraction of points below a given value. For example the 30% quantile is the point at which 30% of the data fall below and 70% fall above that value. In R the function qqnorm is used and in this example we also added the function qqline. qqline draws a line where all the points should have been if the amazon returns were perfectly normally distributed.
8. Let's practice!
Now it's your turn to explore those tools.