1. Introduction to experimental design
Hello, my name is Joanne and I’m here to teach you about experimental design in R.
2. Intro to experimental design
An experiment starts with a question and involves collecting data with the question in mind and will include analyzing the data to seek an answer. In this course, we’ll focus on asking good questions - in statistical language, formulating clear hypotheses, design the data collection process, and the analysis of collected data.
3. Steps of an experiment
The three high-level steps of an experiment are planning, design, and analysis. For planning, you start with your hypothesis -- your question, or even a series of questions. What are you hoping to answer? What is the population of interest, those to whom it applies? What will your dependent variable be, the outcome, which hopefully can be measured to answer the question? What are your independent or explanatory variables, the variables you think may explain the dependent variable?
4. Key components of an experiment
The three key components of an experiment include randomization, replication, and blocking. All of these are done to assess variability across our study population, meaning we're looking to explain the variation in an outcome by the different explanatory variables.
In order to reduce bias, we need to randomize, create a replicable experiment, and sometimes block. Say we want to test who can score the most free-throws in 5 minutes. We select as one group the high school basketball team, and for the other an English class; this is an incredibly biased experiment right out of the gate. Any conclusions we draw will be based on our poor selection of groups and lack of randomization.
5. Randomization
In this video, we'll look at randomization, which is a key tenet of any experiment. Randomization helps ensure that variability in outcome due to outside factors that we're not studying in an experiment are evenly distributed among treatment groups. One example of randomization is double-blind medical trials, where neither the patient nor the researcher knows if the patient is receiving treatment or not. The patient is randomized by a third party into one of the two groups.
6. Recap: t-tests
Recall that a t-test can help you answer some of your research questions by comparing the means of two groups. Let's perform a t-test on the famous mtcars dataset to test if the mean of mpg differs from 40.
First, built-in data can be loaded with the data() function. The dataset will be loaded as a dataframe with the same name passed as an argument to data(). For example, you can load the mtcars dataset using data("mtcars").
To conduct a two-sided t-test with mtcars, you'll use the function t-dot-test(), where the argument x is the outcome in question, alternative is set to "two.sided", and mu is the value you're testing to see if the mean of mpg is not equal to.
Throughout the course, the package broom will also be very helpful. It is designed to work with dplyr package and summarizes key information about models, say for instance, t-dot-test outputs, in tidy tibbles.
tidy() is one of the powerful functions from broom package which produces a tibble() where each row contains information about an important component of the model.
7. Let's practice!
Now you have all the tools you need, let's dive into the exercises!