Exercise

# Filtering numeric fields conditionally

Again, understanding the context of your data is extremely important. We want to understand what a normal range of houses sell for. Let's make sure we exclude any outlier homes that have sold for significantly more or less than the average. Here we will calculate the mean and standard deviation and use them to filer the near normal field `log_SalesClosePrice`

.

Instructions

**100 XP**

- Import
`mean()`

and`stddev()`

from`pyspark.sql.functions`

. - Use
`agg()`

to calculate the mean and standard deviation for`'log_SalesClosePrice'`

with the imported functions. - Create the upper and lower bounds by taking
`mean_val`

+/- 3 times`stddev_val`

. - Create a
`where()`

filter for`'log_SalesClosePrice'`

using both`low_bound`

and`hi_bound`

.