Hepsini bir araya getirmek

Bir aritmi tespiti girişimine yeni katıldın ve arrh aritmi veri kümesi üzerinde bir model eğitmek istiyorsun. Random forest'ların Kaggle yarışmalarında sıkça kazandığını fark ettin, bu yüzden grid search kullanarak azami derinliği 2, 5 veya 10 olacak şekilde denemek istiyorsun. Ayrıca veri kümesinin boyutunun oldukça yüksek olduğunu gözlemledin, bu yüzden bir özellik seçimi yönteminin etkisini de değerlendirmek istiyorsun.

Yanlışlıkla aşırı öğrenme yapmamak için verini şimdiden böldün. Grid search için X_train ve y_train'i, özellik seçiminin yardımcı olup olmadığını anlamak içinse X_test ve y_test'i kullanacaksın. Dört veri kümesi katmanının tümü çalışma ortamında önceden yüklü. Ayrıca GridSearchCV(), train_test_split(), SelectKBest(), chi2() ve RandomForestClassifier'a rfc adıyla erişimin var.

Bu egzersiz

Python'da Machine Learning İş Akışları Tasarlama

kursunun bir parçasıdır

Kursu Görüntüle

Egzersiz talimatları

RandomForestClassifier için azami derinlik 2, 5 ve 10 olacak şekilde grid search yap ve en iyi performans gösteren parametre ayarını sakla.
Şimdi tahmin ediciyi, yukarıda bulunan en iyi sonuç veren ağaç sayısı ayarıyla yeniden fit et.
chi2 puanlama fonksiyonuyla SelectKBest özellik seçiciyi uygula ve sınıflandırıcıyı yeniden fit et.

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Find the best value for max_depth among values 2, 5 and 10
grid_search = GridSearchCV(
  ____(random_state=1), param_grid=____)
best_value = grid_search.____(
  ____, ____).best_params_['max_depth']

# Using the best value from above, fit a random forest
clf = rfc(
  random_state=1, ____=best_value).____(X_train, y_train)

# Apply SelectKBest with chi2 and pick top 100 features
vt = SelectKBest(____, k=____).____(X_train, y_train)

# Create a new dataset only containing the selected features
X_train_reduced = ____.transform(____)

Kodu Düzenle ve Çalıştır

Python'da Machine Learning İş Akışları Tasarlama

AvançadoNível de habilidade

4.8+

87 reviews

In the previous chapters you established a solid foundation in supervised learning, complete with knowledge of deploying models in production but always assumed you a labeled dataset would be available for your analysis. In this chapter, you take on the challenge of modeling data without any, or with very few, labels. This takes you into a journey into anomaly detection, a kind of unsupervised modeling, as well as distance-based learning, where beliefs about what constitutes similarity between two examples can be used in place of labels to help you achieve levels of accuracy comparable to a supervised workflow. Upon completing this chapter, you will clearly stand out from the crowd of data scientists in confidently knowing what tools to use to modify your workflow in order to overcome common real-world challenges.

Exercise 1: Anomaly detection Exercise 2: A simple outlier Exercise 3: LoF contamination Exercise 4: Novelty detection Exercise 5: A simple novelty Exercise 6: Three novelty detectors Exercise 7: Contamination revisited Exercise 8: Distance-based learning Exercise 9: Find the neighbor Exercise 10: Not all metrics agree Exercise 11: Unstructured data Exercise 12: Restricted Levenshtein Exercise 13: Bringing it all together Exercise 14: Concluding remarks