Aan de slagGa gratis aan de slag

Dropping a list of columns

Our data set is rich with a lot of features, but not all are valuable. We have many that are going to be hard to wrangle into anything useful. For now, let's remove any columns that aren't immediately useful by dropping them.

  • 'STREETNUMBERNUMERIC': The postal address number on the home
  • 'FIREPLACES': Number of Fireplaces in the home
  • 'LOTSIZEDIMENSIONS': Free text describing the lot shape
  • 'LISTTYPE': Set list of values of sale type
  • 'ACRES': Numeric area of lot size

Deze oefening maakt deel uit van de cursus

Feature Engineering with PySpark

Cursus bekijken

Oefeninstructies

  • Read the list of column descriptions above and explore their top 30 values with show(), the dataframe is already filtered to the listed columns as df
  • Create a list of two columns to drop based on their lack of relevance to predicting house prices called cols_to_drop. Recall that computers only interpret numbers explicitly and don't understand context.
  • Use the drop() function to remove the columns in the list cols_to_drop from the dataframe df.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Show top 30 records
df.____(____)

# List of columns to remove from dataset
cols_to_drop = [____, ____]

# Drop columns in list
df = df.____(____)
Code bewerken en uitvoeren