Get startedGet started for free

Formatting dates in R

1. Formatting dates in R

An important data cleaning step in time series analysis is ensuring that all our dates are represented in consistent, clear formats. In the last video, we discussed the data classes in R for working with temporal data; now, we'll build on this by standardizing the formats of our dates, so converting between data classes is smoother.

2. Order of time elements

In real life, there are countless ways of representing dates and times. Different countries and regions have preferred orders for the day, month, and year; for example, in the United States the convention is to write the month first, then the day, and then the year. However, in most countries such as the United Kingdom, the typical order is day, month, then year. This can lead to serious consequences when the date format is ambiguous — does "6/4/2010" refer to June fourth, or to April sixth? People would give different answers based on where they live and their personal preference. We'll use the term "time elements" to refer to the distinct "parts" of a date format, such as day, month, year, and hours, minutes, and seconds, and so on. When we say "the order of the time elements", we're referring to the order in which the day, month, year, etc appear in the format.

3. ISO 8601

The solution to our problem is to use an international, standardized format. The most commonly used and accepted standard is called ISO 8601. In ISO 8601, time elements — year, month, and day, etc — are arranged in order from largest to smallest; the year comes first, then the month, then the day, like so. This solves our ambiguity problem: the cases when there are two equally-valid interpretations of a date. Time elements, as well as the date and time components, are separated by defined characters, namely, a hyphen between the elements in the date. ISO 8601 ensures that everyone uses the same separators, and also makes dates and times more legible compared to having all the numbers glued together! There's tons more to ISO 8601 than this, but we won't be covering that in this course.

4. Formatting dates and times

We know of a standard format – ISO 8601 – but how can we work with date formats in R? We're in luck! The parse_date_time function from lubridate takes an input character or Date vector and returns an output of class POSIXct — a class designed for dates and times. By default, POSIXct objects are formatted with ISO 8601. To use this function, we set the orders argument to a string, which uses characters we'll refer to as "conversion specifications" to specify the time elements. We can also use characters like commas and spaces between elements in the input format. There are many conversion specifications, and they can take some getting used to; their documentation can be viewed with help-strptime. It's useful to learn the most common ones, though! For example, "%Y" represents the four-digit year, "%m" is the numerical month like 08 for August, and "%d" is the day of the month.

5. Parsing multiple date formats

We can use parse_date_time to format a Date object with a single date format, but what happens if we have multiple date formats within the same object? Great news — the parse_date_time function is up to the task! To use it, we pass the input object, then set the orders argument to a vector of the different input date formats. As long as we include every date format seen in our input object, parse_date_time will return everything in ISO 8601!

6. Let's practice

Great job following along! Head over to the exercises and practice manipulating date formats!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.