BaşlayınÜcretsiz Başlayın

Categorical columns

In the flights data there are two columns, carrier and org, which hold categorical data. You need to transform those columns into indexed numerical values.

Bu egzersiz

Machine Learning with PySpark

kursunun bir parçasıdır
Kursu Görüntüle

Egzersiz talimatları

  • Import the appropriate class and create an indexer object to transform the carrier column from a string to an numeric index.
  • Prepare the indexer object on the flight data.
  • Use the prepared indexer to create the numeric index column.
  • Repeat the process for the org column.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

from pyspark.ml.feature import ____

# Create an indexer
indexer = ____(inputCol=____, outputCol='carrier_idx')

# Indexer identifies categories in the data
indexer_model = indexer.____(flights)

# Indexer creates a new column with numeric index values
flights_indexed = ____.____(____)

# Repeat the process for the other categorical feature
flights_indexed = ____(inputCol=____, outputCol='org_idx').____(____).____(____)
flights_indexed.show(5)
Kodu Düzenle ve Çalıştır