One-hot encoding specific columns

A local used car dealership wants your help in predicting the sale price of their vehicles. If you use one-hot encoding on the entire used_cars dataset, the new dataset has over 1,200 columns. You are worried that this might lead to problems when training your machine learning models to predict price. You have decided to try a simpler approach and only use one-hot encoding on a few columns.

Diese Übung ist Teil des Kurses

Working with Categorical Data in Python

Anleitung zur Übung

Create a new dataset, used_cars_simple, with one-hot encoding for these columns: "manufacturer_name" and "transmission" (in this order).
Set the prefix of all new columns to "dummy", so that you can easily filter to newly created columns.

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

# Create one-hot encoding for just two columns
used_cars_simple = pd.____(
  used_cars,
  # Specify the columns from the instructions
  ____,
  # Set the prefix
  ____
)

# Print the shape of the new dataset
print(used_cars_simple.shape)

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

Working with Categorical Data in Python

Mittlere SchwierigkeitSchwierigkeitsgrad

4.8+

Kurs kostenlos starten

Almost every dataset contains categorical information—and often it’s an unexplored goldmine of information. In this chapter, you’ll learn how pandas handles categorical columns using the data type category. You’ll also discover how to group data by categories to unearth great summary statistics.

Exercise 1: Course introduction Exercise 2: Categorical vs. numerical Exercise 3: Exploring a target variable Exercise 4: Ordinal categorical variables Exercise 5: Categorical data in pandas Exercise 6: Setting dtypes and saving memory Exercise 7: Creating a categorical pandas Series Exercise 8: Setting dtype when reading data Exercise 9: Grouping data by category in pandas Exercise 10: Create lots of groups Exercise 11: Setting up a .groupby() statement Exercise 12: Using pandas functions effectively

Now it’s time to learn how to set, add, and remove categories from a Series. You’ll also explore how to update, rename, collapse, and reorder categories, before applying your new skills to clean and access other data within your DataFrame.

Exercise 1: Setting category variables Exercise 2: Setting categories Exercise 3: Adding categories Exercise 4: Removing categories Exercise 5: Updating categories Exercise 6: Collapsing categories knowledge check Exercise 7: Renaming categories Exercise 8: Collapsing categories Exercise 9: Reordering categories Exercise 10: Reordering categories in a Series Exercise 11: Using .groupby() after reordering Exercise 12: Cleaning and accessing data Exercise 13: Cleaning variables Exercise 14: Accessing and filtering data

In this chapter, you’ll use the seaborn Python library to create informative visualizations using categorical data—including categorical plots (cat-plot), box plots, bar plots, point plots, and count plots. You’ll then learn how to visualize categorical columns and split data across categorical columns to visualize summary statistics of numerical columns.

Exercise 1: Introduction to categorical plots using Seaborn Exercise 2: Boxplot understanding Exercise 3: Creating a box plot Exercise 4: Seaborn bar plots Exercise 5: Creating a bar plot Exercise 6: Ordering categories Exercise 7: Bar plot using hue Exercise 8: Point and count plots Exercise 9: Creating a point plot Exercise 10: Creating a count plot Exercise 11: Review catplot() types Exercise 12: Additional catplot() options Exercise 13: One visualization per group Exercise 14: Updating categorical plots

Lastly, you’ll learn how to overcome the common pitfalls of using categorical data. You’ll also grow your data encoding skills as you are introduced to label encoding and one-hot encoding—perfect for helping you prepare your data for use in machine learning algorithms.

Exercise 1: Categorical pitfalls Exercise 2: Memory usage knowledge check Exercise 3: Overcoming pitfalls: string issues Exercise 4: Overcoming pitfalls: using NumPy arrays Exercise 5: Label encoding Exercise 6: Create a label encoding and map Exercise 7: Using saved mappings Exercise 8: Creating a Boolean encoding Exercise 9: One-hot encoding Exercise 10: One-hot knowledge check Exercise 11: One-hot encoding specific columns

Aktuelle Übung

Exercise 12: Wrap-up video