Blocking experimental data
You are working with a manufacturing firm that wants to conduct some experiments on worker productivity. Their dataset only contains 100 rows, so it's important that experimental groups are balanced.
This sounds like a great opportunity to use your knowledge of blocking to assist them. They have provided a productivity_subjects
DataFrame. Split the provided dataset into two even groups of 50 entries each.
The libraries numpy
and pandas
have been imported as np
and pd
respectively.
This exercise is part of the course
Experimental Design in Python
Exercise instructions
- Randomly select 50 subjects from the
productivity_subjects
DataFrame into a new DataFrameblock_1
without replacement. - Set a new column,
block
to 1 for theblock_1
DataFrame. - Assign the remaining subjects to a DataFrame called
block_2
and set theblock
column to 2 for this DataFrame. - Concatenate the blocks together into a single DataFrame, and print the count of each value in the
block
column to confirm the blocking worked.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Randomly assign half
block_1 = productivity_subjects.____(____, random_state=42, ____)
# Set the block column
block_1['block'] = ____
# Create second assignment and label
block_2 = ____
block_2['block'] = ____
# Concatenate and print
productivity_combined = pd.____([block_1, block_2], axis=0)
print(productivity_combined['block'].value_counts())