Testing QuantileTransformer
Standardization is prone to the same pitfalls as z-scores. Both use mean and standardization in their calculations, which makes them highly sensitive to extreme values.
To get around this problem, you should use QuantileTransformer
which uses quantiles. Quantiles of a distribution stay the same regardless of the magnitude of outliers.
You should use StandardScaler
when the data is normally distributed (which can be checked with a histogram). For other distributions, QuantileTransformer
is a better choice.
You'll practice on the loaded females
dataset. matplotlib.pyplot
is loaded under its standard alias, plt
.
This exercise is part of the course
Anomaly Detection in Python
Exercise instructions
- Instantiate a
QuantileTransformer()
that transforms features into a normal distribution and assigns it toqt
. - Fit and transform the feature array
X
and preserve the column names. - Plot a histogram of the
palmlength
column.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
from sklearn.preprocessing import QuantileTransformer
# Instantiate an instance that casts to normal
qt = ____
# Fit and transform the feature array
X.____ = ____
# Plot a histogram of palm length
plt.____(____, color='red')
plt.xlabel("Palm length")
plt.show()