Fitting model to training data

It's time to split your data into a training set to fit a model and a separate test set to evaluate the predictive power of the model. Before making this split however, we first sample 100% of the rows of house_prices without replacement and assign this to house_prices_shuffled. This has the effect of "shuffling" the rows, thereby ensuring that the training and test sets are randomly sampled.

Bu egzersiz, kursun bir parçasıdır

Modeling with Data in the Tidyverse

Kursa Göz Atın

Uygulamalı etkileşimli egzersiz

Bu egzersizi bu örnek kodu tamamlayarak deneyin.

# Set random number generator seed value for reproducibility
set.seed(76)

# Randomly reorder the rows
house_prices_shuffled <- house_prices %>% 
  sample_frac(size = 1, replace = FALSE)

# Train/test split
train <- house_prices_shuffled %>%
  slice(___:___)
test <- house_prices_shuffled %>%
  slice(___:___)

Kodu Düzenle ve Çalıştır

Bu egzersiz, kursun bir parçasıdır

Modeling with Data in the Tidyverse

IntermediárioNível de habilidade

4.9+

Kursa Ücretsiz Başla

This chapter will introduce you to some background theory and terminology for modeling, in particular, the general modeling framework, the difference between modeling for explanation and modeling for prediction, and the modeling problem. Furthermore, you'll start performing your first exploratory data analysis, a crucial first step before any formal modeling.

Exercise 1: Background on modeling for explanation Exercise 2: Exploratory visualization of age Exercise 3: Numerical summaries of age Exercise 4: Background on modeling for prediction Exercise 5: Exploratory visualization of house size Exercise 6: Log10 transformation of house size Exercise 7: The modeling problem for explanation Exercise 8: EDA of relationship of teaching & "beauty" scores Exercise 9: Correlation between teaching and "beauty" scores Exercise 10: The modeling problem for prediction Exercise 11: EDA of relationship of house price and waterfront Exercise 12: Predicting house price with waterfront

Equipped with your understanding of the general modeling framework, in this chapter, we'll cover basic linear regression where you'll keep things simple and model the outcome variable y as a function of a single explanatory/ predictor variable x. We'll use both numerical and categorical x variables. The outcome variable of interest in this chapter will be teaching evaluation scores of instructors at the University of Texas, Austin.

Exercise 1: Explaining teaching score with age Exercise 2: Plotting a "best-fitting" regression line Exercise 3: Fitting a regression with a numerical x Exercise 4: Predicting teaching score using age Exercise 5: Making predictions using "beauty score"Exercise 6: Computing fitted/predicted values & residuals Exercise 7: Explaining teaching score with gender Exercise 8: EDA of relationship of score and rank Exercise 9: Fitting a regression with a categorical x Exercise 10: Predicting teaching score using gender Exercise 11: Making predictions using rank Exercise 12: Visualizing the distribution of residuals

In the previous chapter, you learned about basic regression using either a single numerical or a categorical predictor. But why limit ourselves to using only one variable to inform your explanations/predictions? You will now extend basic regression to multiple regression, which allows for incorporation of more than one explanatory or one predictor variable in your models. You'll be modeling house prices using a dataset of houses in the Seattle, WA metropolitan area.

Exercise 1: Explaining house price with year & size Exercise 2: EDA of relationship Exercise 3: Fitting a regression Exercise 4: Predicting house price using year & size Exercise 5: Making predictions using size and bedrooms Exercise 6: Interpreting residuals Exercise 7: Explaining house price with size & condition Exercise 8: Parallel slopes model Exercise 9: Interpreting the parallel slopes model Exercise 10: Predicting house price using size & condition Exercise 11: Making predictions using size and waterfront Exercise 12: Automating predictions on "new" houses

In the previous chapters, you fit various models to explain or predict an outcome variable of interest. However, how do we know which models to choose? Model assessment measures allow you to assess how well an explanatory model "fits" a set of data or how accurate a predictive model is. Based on these measures, you'll learn about criteria for determining which models are "best".

Exercise 1: Model selection and assessment Exercise 2: Refresher: sum of squared residuals Exercise 3: Which model to select?Exercise 4: Assessing model fit with R-squared Exercise 5: Computing the R-squared of a model Exercise 6: Comparing the R-squared of two models Exercise 7: Assessing predictions with RMSE Exercise 8: Computing the MSE & RMSE of a model Exercise 9: Comparing the RMSE of two models Exercise 10: Validation set prediction framework Exercise 11: Fitting model to training data

Geçerli egzersiz

Exercise 12: Predicting on test data Exercise 13: Conclusion - Where to go from here?