MulaiMulai sekarang secara gratis

Train and testing transformations (II)

Similar to applying the same scaler to both your training and test sets, if you have removed outliers from the train set, you probably want to do the same on the test set as well. Once again you should ensure that you use the thresholds calculated only from the train set to remove outliers from the test set.

Similar to the last exercise, we split the so_numeric_df DataFrame into train (so_train_numeric) and test (so_test_numeric) sets.

Latihan ini adalah bagian dari kursus

Feature Engineering for Machine Learning in Python

Lihat Kursus

Petunjuk latihan

  • Calculate the standard deviation and mean of the ConvertedSalary column.
  • Calculate the upper and lower bounds as three standard deviations away from the mean in both the directions.
  • Trim the so_test_numeric DataFrame to retain all rows where ConvertedSalary is within the lower and upper bounds.

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

train_std = so_train_numeric['ConvertedSalary'].____
train_mean = so_train_numeric['ConvertedSalary'].____

cut_off = train_std * 3
train_lower, train_upper = ____, train_mean + cut_off

# Trim the test DataFrame
trimmed_df = so_test_numeric[(so_test_numeric['ConvertedSalary'] < ____) \
                             & (so_test_numeric['ConvertedSalary'] > ____)]
Edit dan Jalankan Kode