Time Components
Being able to work with time components for building features is important but you can also use them to explore and understand your data further. In this exercise, you'll be looking to see if there is a pattern to which day of the week a house lists on. Please keep in mind that PySpark's week starts on Sunday, with a value of 1 and ends on Saturday, a value of 7.
Este exercício faz parte do curso
Feature Engineering with PySpark
Instruções do exercício
- Import
to_date()
anddayofweek()
functions frompyspark.sql.functions
- Use the
to_date()
function to convertLISTDATE
to a Spark date type, save the converted column in place usingwithColumn()
- Create a new column using
LISTDATE
anddayofweek()
then save it asList_Day_of_Week
usingwithColumn()
- Sample half the dataframe and convert it to a pandas dataframe with
toPandas()
and plot the count of the pandas dataframe'sList_Day_of_Week
column by using seaborncountplot()
where x =List_Day_of_Week
.
Exercício interativo prático
Experimente este exercício completando este código de exemplo.
# Import needed functions
from ____ import ____, ____
# Convert to date type
df = df.____(____, ____(____))
# Get the day of the week
df = df.____(____, ____(____))
# Sample and convert to pandas dataframe
sample_df = df.sample(False, ____, 42).____()
# Plot count plot of of day of week
sns.____(x="List_Day_of_Week", data=____)
plt.show()