Exercise

# Exploratory data analysis

Before diving into the nitty gritty of pipelines and preprocessing, let's do some exploratory analysis of the original, unprocessed Ames housing dataset. When you worked with this data in previous chapters, we preprocessed it for you so you could focus on the core XGBoost concepts. In this chapter, you'll do the preprocessing yourself!

A smaller version of this original, unprocessed dataset has been pre-loaded into a `pandas`

DataFrame called `df`

. Your task is to explore `df`

in the Shell and pick the option that is **incorrect**. The larger purpose of this exercise is to understand the kinds of transformations you will need to perform in order to be able to use XGBoost.

Instructions

**50 XP**

##### Possible Answers

- The DataFrame has 21 columns and 1460 rows.
- The mean of the
`LotArea`

column is`10516.828082`

. - The DataFrame has missing values.
- The
`LotFrontage`

column has no missing values and its entries are of type`float64`

. - The standard deviation of
`SalePrice`

is`79442.502883`

.