1
Introduction to R
In this first lab, you'll learn the basics of how to analyze data with R. You are suggested to take this introductory lab if you are not yet familiar with this powerful open-source language.
2
Introduction to data
Some define Statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information - the data. In this lab, we will gain insight into public health by generating simple graphical and numerical summaries of a data set collected by the Centers for Disease Control and Prevention (CDC). As this is a large data set, along the way we'll also learn the indispensable skills of data processing and subsetting.
3
Probability
In this lab, we will investigate the phenomenon of hot hands in basketball, or specifically, whether Kobe Bryant has hot hands. We will make use of simulations in our investigation.
4
Foundations for inference: Sampling distributions
In this two part lab we will investigate sampling distributions and the Central Limit Theorem as well as confidence intervals. We will use housing data from Ames, Iowa (a small town in the US) in our exploration.
5
Foundations for inference: Confidence intervals
In this two part lab we will investigate sampling distributions and the Central Limit Theorem as well as confidence intervals. We will use housing data from Ames, Iowa (a small town in the US) in our exploration.
6
Inference for numerical data
In this two part lab we will work on inference for numerical data. We will use a dataset on births from North Carolina as well as data from the General Social Survey.
7
Inference for categorical data
In this lab we will work on inference for categorical data using data from a world-wide survey on religiosity and atheism.
8
Introduction to linear regression
The movie Moneyball focuses on the "quest for the secret of success in baseball". It follows a low-budget team, the Oakland Athletics, who believed that underused statistics, such as a player's ability to get on base, better predict the ability to score runs than typical statistics like home runs, RBIs (runs batted in), and batting average. In this lab we'll be looking at data from all 30 Major League Baseball teams and examining the linear relationship between runs scored in a season and a number of other player statistics. Our aim will be to summarize these relationships both graphically and numerically in order to find which variable, if any, helps us best predict a team's runs scored in a season.
9
Multiple linear regression
Many college courses conclude by giving students the opportunity to evaluate the course and the instructor anonymously. However, the use of these student evaluations as an indicator of course quality and teaching effectiveness is often criticized because these measures may reflect the influence of non-teaching related characteristics, such as the physical appearance of the instructor. The article titled, "Beauty in the classroom: instructors' pulchritude and putative pedagogical productivity" (Hamermesh and Parker, 2005) found that instructors who are viewed to be better looking receive higher instructional ratings. In this lab we will analyze the data from this study in order to learn what goes into a positive professor evaluation.

Eliminating variables from the model - adjusted R-squared selection

Now you will create a new model, where you will drop the variable that when dropped yields the highest improvement in the adjusted $R^{2}$ .

Create a new model, m1, where you remove rank from the list of explanatory variables. Check out the adjusted $R^{2}$ of this new model and compare it to the adjusted $R^{2}$ of the full model.
If you don't want to view the entire model output, but just the adjusted R-squared, use summary(m1)$adj.r.squared.
Create another new model, m2, where you remove ethnicity from the list of explanatory variables. Check out the adjusted $R^{2}$ of this new model and compare it to the adjusted $R^{2}$ of the full model.
Repeat until you have tried removing each variable from the full model m_full at a time, and determine the removal of which variable yields the highest improvement in the adjusted $R^{2}$ .
Make note of this variable (you will be asked about it in the next question).

script.R

R Console