Analyzing skewed data with a permutation test
Permutation tests can be useful for situations which don't satisfy the conditions of the hypothesis tests you already know. In this exercise you'll code up a permutation test using the statsmodels
package.
You're interested in comparing the average number of funding rounds between between companies in the analytics space and all other venture-funded companies. While you may be tempted to use a t-test, you can be sure that the number of funding rounds is not normally distributed. Instead, the majority of companies have only one round, with the number of companies with two or more rounds quickly dropping off.
The following have been loaded for you:
analytics_df
- Data on all analytics companiesnon_analytics_df
- Data on all other non-analytics companies
This exercise is part of the course
Foundations of Inference in Python
Exercise instructions
- Define a statistic function which, given two samples
fundings_group_1
andfundings_group_2
, returns the difference in mean number offunding_rounds
. - Conduct a permutation test using the
funding_rounds
column from each data set, the statistic function you defined, and 100 resamples. - Print out the resulting p-value of your permutation test.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Write a "statistic" function which calculates the difference in means
def statistic(funding_group_1, funding_group_2):
return ____(fundings_group_1) - ____(funding_group_2)
# Conduct a permutation test using 100 resamples
perm_result = stats.permutation_test((____['funding_rounds'], ____['funding_rounds']),
statistic=____,
n_resamples=____,
vectorized=____)
# Print the p-value
____(____.pvalue)