Exercise

One Hot Encoding

In the United States where you live determines which schools your kids can attend. Therefore it's understandable that many people care deeply about which school districts their future home will be in. While the school districts are numbered in SCHOOLDISTRICTNUMBER they are really categorical. Meaning that summing or averaging these values has no apparent meaning. Therefore in this example we will convert SCHOOLDISTRICTNUMBER from a categorial variable into a numeric vector to use in our machine learning model later.

Instructions

100 XP
  • Instantiate a StringIndexer transformer called string_indexer with SCHOOLDISTRICTNUMBER as the input and School_Index as the output.
  • Apply the transformer string_indexer to df with fit() and transform(). Store the transformed dataframe in indexed_df.
  • Create a OneHotEncoder transformer called encoder using School_Index as the input and School_Vec as the output.
  • Apply the transformation to indexed_df using transform(). Inspect the iterative steps of the transformation with the supplied code.