1. Learn
  2. /
  3. Courses
  4. /
  5. Feature Engineering for Machine Learning in Python

Exercise

Binning values

For many continuous values you will care less about the exact value of a numeric column, but instead care about the bucket it falls into. This can be useful when plotting values, or simplifying your machine learning models. It is mostly used on continuous variables where accuracy is not the biggest concern e.g. age, height, wages.

Bins are created using pd.cut(df['column_name'], bins) where bins can be an integer specifying the number of evenly spaced bins, or a list of bin boundaries.

Instructions 1/2

undefined XP
  • 1

    Bin the value of the ConvertedSalary column in so_survey_df into 5 equal bins, in a new column called equal_binned.

  • 2

    Bin the ConvertedSalary column using the boundaries in the list bins and label the bins using labels.