Weather in Auckland

1. Weather in Auckland

In practice you'll often come across dates and times when data is being recorded on a timed schedule. It may be a website logging traffic, a sales team reporting weekly results, or scientific instruments measuring the environment. To help you see how your lubridate skills integrate into a data analysis pipeline, you'll be exploring the weather in Auckland. Why Auckland? It's R's birthplace, although this weather is from the airport which is about 20km from the University where R was developed. The data comes from Weather Underground, a weather website, but their data for Auckland comes from parsing METARS data, an automated weather service for pilots. There are two data sets you'll explore: daily observations and sub-hourly observations.

2. akl_weather_daily.csv

Let's take a look at part of the data file for the daily data. It's a CSV file, so the variables are separated by commas. For each day over 10 years we have the max, min, and mean temperature, mean relative humidity, a record on any events and the cloud cover. Can you see the date info? You know how to handle that, and you'll parse it in shortly.

3. akl_weather_hourly_2016.csv

The subhourly data is a little different. Can you see where the date is recorded? It is split over year month and mday variables. To convert this to a date you need to know about one more lubridate function make_date.

4. make_date(year, month, day)

make_date has arguments year, month and day to allow you to specify a date from its individual components. For example make_date(year equals 2013, month equals 2, day equals 27) produces the date object corresponding to Feb 27 2013. The components can be vectors which makes it convenient for constructing dates from individual columns, which you'll do to import the hourly data. There's also a make_datetime function that adds hour, min and sec arguments to build a datetime from it's components.

5. dplyr Review

The tasks you'll tackle with the Auckland weather are easily completed by combining your new lubridate skills with dplyr. As a quick review the useful dplyr verbs are: mutate which adds columns, filter to subset rows, select to subset columns and arrange to order rows. summarise summarizes rows down to one row, and is most useful in conjunction with group_by which allows you to calculate grouped summaries. All the dplyr verbs take a data frame as the first argument and return a data frame. This makes them

6. Pipe %>%

very amenable to piping. Remember the pipe operator takes the result of the left hand side and passes it as the first argument to the right hand side. So, instead of writing manipulations in a nested manner we can write them in a more readable linear fashion.

7. Let's practice!

Ok, you're set to get that weather data into R.