MulaiMulai sekarang secara gratis

Dummy trap

A dummy trap is a situation where different dummy variables convey the same information. In this case, if an employee is, say, from the accounting department (i.e. value in the accounting column is 1), then you're certain that s/he is not from any other department (values everywhere else are 0). Thus, you could actually learn about his/her department by looking at all the other departments.

For that reason, whenever \(n\) dummies are created (in your case, 10), only \(n\) - 1 (in your case, 9) of them are enough, and the \(n\)-th column's information is already included.

Therefore, you will get rid of the old department column, drop one of the department dummies to avoid dummy trap, and then join the two DataFrames.

Latihan ini adalah bagian dari kursus

HR Analytics: Predicting Employee Churn in Python

Lihat Kursus

Petunjuk latihan

  • .drop() the accounting column to avoid "dummy trap".
  • .drop() the old column department as you do not need it anymore.
  • Join the new departments DataFrame to the employee dataset (this has been done for you).

Latihan interaktif praktis

Cobalah latihan ini dengan menyelesaikan kode contoh berikut.

# Drop the "accounting" column to avoid "dummy trap"
departments = departments.____("____", axis=1)

# Drop the old column "department" as you don't need it anymore
data = data.____("____", axis=1)

# Join the new DataFrame "departments" to your employee dataset: done
data = data.join(departments)
Edit dan Jalankan Kode