1. Temporal data classes in R
Doing great so far! In this lesson, we'll go over a key aspect of time series analysis in R — the data classes designed to represent time-based, or "temporal" data!
2. Date-time classes
There are many ways to represent dates and times in R. Here, we'll be covering a common attribute of an object called its "class". The class of an object tells R what methods to use when manipulating the object. While we won't dive too deep into classes themselves, it's important to be aware of the classes that are used to represent dates and times in R.
First, there's numeric, which stores values as either integers or "doubles" — real numbers. Dates stored as numerics represent the integer number of days since January 1st, 1970 — the start of the Unix Epoch.
Next, there's character, which represents strings of text. When importing data from outside sources, dates in R are often represented with the character class. Dates as characters in R often have the format year, month, day, separated with a hyphen "-". Dates stored as characters don't always conform to this standard, though!
3. Date-time classes
We then have Date, note the capitalization here, which represents the day of the year. When we print out a Date object in R, it looks identical to a character object; it's therefore important to be able to check the class of objects.
The Date class is useful, as it allows us to do math with dates; subtracting two Date objects, for example, returns the difference between them in days.
Finally, there's POSIXct, which represents the number of seconds since the beginning of the Unix Epoch: January 1st, 1970. The POSIXct class includes both the date and the time, as well as the time zone, allowing for accurate calculations when working with time.
Likewise, we can also perform arithmetic with POSIXct objects. We'll talk more about the format of POSIXct itself in the next lesson, too.
4. Lubridate
The as-underscore-date function from the lubridate package converts a character or a numeric date to the Date class.
It works similar to base R's as-dot-date, but with a few improvements; as_date performs better with time zones, and provides a warning message if the date format used is invalid. There's a few more technical differences than that; just know that throughout this course, we'll use the as_date function from lubridate.
While this course isn't designed to cover everything about the lubridate package, we'll make frequent use of many of its functions, like as_date.
5. Testing data classes
The best class to use depends on the data we have and the analysis we're doing, but Date and POSIXct are usually safe bets.
Now, let's look at some R functions that we can use to determine the data class of objects in R.
First is the base R class function, which returns the class of an R object as a character string, such as "numeric", "character", "Date", and so on.
We also have the so-called "is-dot" functions from base R, like is-dot-numeric, is-dot-character, is-dot-Date, and is-dot-POSIXct. There's an is-dot function for almost every class in R, but let's stick with the ones for the four classes we've been talking about.
These functions take an R object, and return a boolean TRUE or FALSE depending on whether that object meets the criteria.
6. Let's practice!
Alright, let's head over to the exercises and try our hand at determining data classes in our time series!