BaşlayınÜcretsiz Başlayın

Distribution of outcome variable values

Stratifying by the outcome variable when generating training and test datasets ensures that the outcome variable values have a similar range in both datasets.

Since the original data is split at random, stratification avoids placing all the expensive homes in home_sales into the test dataset, for example. In this case, your model would most likely perform poorly because it was trained on less expensive homes.

In this exercise, you will calculate summary statistics for the selling_price variable in the training and test datasets. The home_training and home_test tibbles have been loaded from the previous exercise.

Bu egzersiz

Modeling with tidymodels in R

kursunun bir parçasıdır
Kursu Görüntüle

Uygulamalı interaktif egzersiz

Bu örnek kodu tamamlayarak bu egzersizi bitirin.

# Distribution of selling_price in training data
___ %>% 
  summarize(min_sell_price = ___,
            max_sell_price = ___,
            mean_sell_price = ___,
            sd_sell_price = ___)
Kodu Düzenle ve Çalıştır