1. Setting up experiments
Hi! Welcome to this course on experimental design in Python.
2. Experimental Design definition
Experimental design is the process
in which we carry out research in an objective and controlled fashion.
The purpose of this is to ensure we can make specific conclusions in reference to a hypothesis we have.
3. Forming robust statements
Because we use objective tools, we need to use quantified language.
Instead of using words like 'probably', 'likely', and 'small' when noting our conclusions, we should use precise and quantified language. This often takes the form of noting the percentage risk on a Type I error in the conclusion.
Recall that Type I errors occur when we incorrectly reject the null hypothesis when it is actually true.
In this course, you'll learn to design experiments and conduct statistical analyses such that you begin making precise statements about observed results and take informed actions as a result.
4. Why experimental design?
Experimental design is useful in many fields.
Naturally, it is used in academia such as in medical research.
It is also useful in many corporate contexts such as marketing and product analytics, which conduct lots of A/B tests.
It is also used in agriculture and increasingly in government policy through the use of behavioral psychology experiments.
5. Some terminology...
Before we begin our first topic, let's define some important terminology. Subjects are what we are experimenting on. It could be people, employees, or users on a website.
6. Some terminology...
A treatment is some change given to one group of subjects.
7. Some terminology...
We could call that group the treatment group.
8. Some terminology...
The control group is not given any change. This could be a placebo group, for example.
9. Assigning subjects to groups
An important concept in experimental design is how to assign subjects to test groups.
There are two ways we could do this.
We could just split the dataset non-randomly into chunks and assign each chunk to a group.
Or we could use random assignment to sample into our desired groups.
Let's look at each option using a DataFrame of 200 subjects' heights where we want to split into two groups of 100 each.
10. Non-random assignment
Let's try non-random assignment first.
We can use .iloc[] to slice the first 100 rows from heights and assign to group1 and the next 100 rows into group2.
We can use pandas' describe method to check descriptive statistics of our groups. Concatenating the two results with pd.concat() and axis=1 will allow for easier comparison.
These groups appear very different! Looking at the mean row, we can see there's a 9cm difference. Because of the differences in these groups, it will be harder to confidently determine if any changes are due to the treatment intervention.
11. Random assignment
Let's now try random assignment.
We can use pandas' sample method to create a sample of size n, or use the frac argument and specify a proportion of the dataset, between 0 and 1, to sample.
We want two equally-sized groups, so we specify frac=0.5. Using n=100 would also work here. We also set the replace argument to False, so samples aren't selected twice. The random_state argument allows the splits to be consistently reproduced.
group2 can be made by dropping the ids in group1 from the overall DataFrame.
Using the same comparison method we see much closer means.
12. Assignment summary
This demonstrates the importance of randomly assigning subjects to groups. It means we can attribute observed changes to treatment interventions rather than natural differences between the group.
We can use pandas' sample method to select randomly from a DataFrame, and then use pandas' describe method to check differences in group assignment.
13. Let's practice!
Time to put this into practice!