1. Parsing dates with lubridate
Welcome to Chapter 2. In this chapter you'll learn about using the lubridate package to parse and manipulate dates.
To get started you'll learn about importing dates. We'll talk about two options in lubridate: a whole set of functions whose names correspond to different formats, and the more general purpose parse_date_time, a function where the format is specified as an argument.
2. ymd()
Let's consider a familiar date from the last chapter: the 27th of February 2013. In lubridate the function ymd will parse a date when it is formatted year first, then month, then day. So, if we happen to have the date formatted according to ISO 8601, ymd will parse it successfully.
What's great about ymd is it handles dates that are in the right order, but may or may not be exactly ISO 8601.
For example, it ignores any separators, so a date that uses dots instead of dashes parses successfully. And the units don't even need to be numeric. If the month uses the english abbreviation that works too!
3. Friends of ymd()
What's neat about lubridate is that there are a whole family of functions like ymd where the function name specifies the expected format of the date.
There's dmy for day, month, year, mdy for dates in the common US form: month, day, year. And so on.
There are also functions like this for datetimes too. The date part of the function name is followed by an underscore, and then hms, hm or s.
So, dmy_hm will read a date with an accompanying time. Unlike some of R's built-in functions, if you don't specify a timezone, lubridate will assume UTC.
4. parse_date_time(x = ___, order = ___)
While the ymd family of functions handles most dates, you may still run into cases where you need to be more specific. The function parse_date_time in lubridate, also parses dates, but you specify the order in a separate argument.
The order is a string that describes the order of the components in a date, for example here it's the string dmy.
parse_date_time has a handy feature where if the dates you need to parse are in more than one format, you can pass in more than one order. Just pass in a vector of strings to order.
5. Formatting characters
You can find all the possible formating characters on the help page for parse_date_time. Some will seem familiar, lower case y, m and d correspond to year, month and day. An upper case Y is a year without a century. Upper case H, M and S are hours, minutes and seconds.
There are a few others that are useful: a specifies a weekday either abbreviated or in full, depending on the case, similarly b specifies a month name. Upper case I can parse 12 hour times and is often used with lower case p an AM/PM indicator. z parses timezones provided as offsets in hours and minutes from UTC.
6. Let's practice!
Now it's your turn, take what you've learned to parse some dates and times.