1. Aprende
  2. /
  3. Cursos
  4. /
  5. Feature Engineering with PySpark

Connected

Ejercicio

Joining On Time Components

Often times you will use date components to join in other sets of information. However, in this example, we need to use data that would have been available to those considering buying a house. This means we will need to use the previous year's reporting data for our analysis.

Instrucciones

100 XP
  • Extract the year from LISTDATE using year() and put it into a new column called list_year with withColumn()
  • Create another new column called report_year by subtracting 1 from the list_year
  • Create a join condition that matches df['CITY'] with price_df['City'] and df['report_year'] with price_df['Year']
  • Perform a left join between df and price_df