Exercise

Bucketing

If you are a homeowner its very important if a house has 1, 2, 3 or 4 bedrooms. But like bathrooms, once you hit a certain point you don't really care whether the house has 7 or 8. This example we'll look at how to figure out where are some good value points to bucket.

Instructions

100 XP
  • Plot a distribution plot of the pandas dataframe sample_df using Seaborn distplot().
  • Given it looks like there is a long tail of infrequent values after 5, create the bucket splits of 1, 2, 3, 4, 5+
  • Create the transformer buck by instantiating Bucketizer() with the splits for setting the buckets, then set the input column as BEDROOMS and output column as bedrooms.
  • Apply the Bucketizer transformation on df using transform() and assign the result to df_bucket. Then verify the results with show()