Inference with and without outlier (randomization)
Using the randomization test, you can again evaluate the effect of an outlier on the inferential conclusions of a linear model. Run a randomization test on the hypdata_out
data twice: once with the outlying value and once without it. Note that the extended lines of code communicate clearly the steps of the randomization tests.
This exercise is part of the course
Inference for Linear Regression in R
Exercise instructions
Using the data frames hypdata_out
(containing an outlier) and hypdata_noout
(outlier removed), the data frames perm_slope_out
and perm_slope_noout
were created to contain the permuted slopes the original datasets, respectively. The observed values are stored in the variables obs_slope_out
and obs_slope_noout
.
- Find the p-values by finding the proportion of (
abs
olute value) permuted slopes which are larger than or equal to the (abs
olute value of the) observed slopes. As before, usemean
on the binary inequality to find the proportion.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Calculate the p-value with the outlier
perm_slope_out %>%
mutate(abs_perm_slope = ___) %>%
summarize(p_value = ___)
# Calculate the p-value without the outlier
perm_slope_noout %>%
mutate(abs_perm_slope = ___) %>%
summarize(p_value = ___)