ComeçarComece de graça

Linear regression with principal components

The object newsData now contains an additional variable: logShares. The number of shares tell you how often the news articles have been shared. This distribution, however, would be highly skewed, so you are going to work with the logarithm of the number of shares. Apply what you just learned and predict the log shares!

Este exercício faz parte do curso

Machine Learning for Marketing Analytics in R

Ver curso

Instruções do exercício

  • Compute a model to predict the log shares with all other variables. Store it as mod1.
  • Create a new dataframe dataNewsComponents with the log shares and the values on the first 6 components. The object pcaNews again contains the PCA results.
  • Compute a second model (mod2) that predicts the log shares with just the 6 components.
  • Compare the adjusted R squared of the models. How did the value change by using only the principal components? How good is your model?

Exercício interativo prático

Experimente este exercício completando este código de exemplo.

# Predict log shares with all original variables
mod1 <- lm(logShares ~ ., data = ___)

# Create dataframe with log shares and first 6 components
dataNewsComponents <- cbind(logShares = newsData[, "logShares"],
                            ___$x[, 1:__]) %>%
  as.data.frame()

# Predict log shares with first six components
mod2 <- lm(___ ~ ., data = ___)

# Print adjusted R squared for both models
___(mod1)$adj.r.squared
summary(___)$___
Editar e executar o código