Using text filters to remove records

It pays to have to ask your clients lots of questions and take time to understand your variables. You find out that Assumable mortgage is an unusual occurrence in the real estate industry and your client suggests you exclude them. In this exercise we will use isin() which is similar to like() but allows us to pass a list of values to use as a filter rather than a single one.

Use select() and show() to inspect the distinct values in the column 'ASSUMABLEMORTGAGE' and create the list yes_values for all the values containing the string 'Yes'.
Use ~df['ASSUMABLEMORTGAGE'], isin(), and .isNull() to create a NOT filter to remove records containing corresponding values in the list yes_values and to keep records with null values. Store this filter in the variable text_filter.
Use where() to apply the text_filter to df.
Print out the number of records remaining in df.

Exploratory Data Analysis

Wrangling with Spark Functions

Feature Engineering

Building a Model

Exercise

Using text filters to remove records

Instructions