Exercise

Date Math

In this example, we'll look at verifying the frequency of our data. The Mortgage dataset is supposed to have weekly data but let's make sure by lagging the report date and then taking the difference of the dates.

Recall that to create a lagged feature we will need to create a window(). window() allows you to return a value for each record based off some calculation against a group of records, in this case, the previous period's mortgage rate.

Instructions

100 XP
  • Cast mort_df['DATE'] to date type with to_date()
  • Create a window with the Window() function and use orderBy() to sort by mort_df[DATE]
  • Create a new column DATE-1 using withColumn() by lagging the DATE column with lag() and window it using over(w)
  • Calculate the difference between DATE and DATE-1 using datediff() and name it Days_Between_Report