Exploratory data analysis

Before diving into the nitty gritty of pipelines and preprocessing, let's do some exploratory analysis of the original, unprocessed Ames housing dataset. When you worked with this data in previous chapters, we preprocessed it for you so you could focus on the core XGBoost concepts. In this chapter, you'll do the preprocessing yourself!

A smaller version of this original, unprocessed dataset has been pre-loaded into a pandas DataFrame called df. Your task is to explore df in the Shell and pick the option that is incorrect. The larger purpose of this exercise is to understand the kinds of transformations you will need to perform in order to be able to use XGBoost.

Possible answers

The DataFrame has 21 columns and 1460 rows.

The mean of the LotArea column is 10516.828082.

The DataFrame has missing values.

The LotFrontage column has no missing values and its entries are of type float64.

The standard deviation of SalePrice is 79442.502883.

IPython Shell

Classification with XGBoost

Regression with XGBoost

Fine-tuning your XGBoost model

Using XGBoost in pipelines

Exercise

Exercise

Exploratory data analysis

Instructions

Possible answers