Get startedGet started for free

Analyzing skewed data with a permutation test

Permutation tests can be useful for situations which don't satisfy the conditions of the hypothesis tests you already know. In this exercise you'll code up a permutation test using the statsmodels package.

You're interested in comparing the average number of funding rounds between between companies in the analytics space and all other venture-funded companies. While you may be tempted to use a t-test, you can be sure that the number of funding rounds is not normally distributed. Instead, the majority of companies have only one round, with the number of companies with two or more rounds quickly dropping off.

The following have been loaded for you:

  • analytics_df - Data on all analytics companies
  • non_analytics_df - Data on all other non-analytics companies

This exercise is part of the course

Foundations of Inference in Python

View Course

Exercise instructions

  • Define a statistic function which, given two samples fundings_group_1 and fundings_group_2, returns the difference in mean number of funding_rounds.
  • Conduct a permutation test using the funding_rounds column from each data set, the statistic function you defined, and 100 resamples.
  • Print out the resulting p-value of your permutation test.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Write a "statistic" function which calculates the difference in means
def statistic(funding_group_1, funding_group_2):
  return ____(fundings_group_1) - ____(funding_group_2)

# Conduct a permutation test using 100 resamples
perm_result = stats.permutation_test((____['funding_rounds'], ____['funding_rounds']),
                                    statistic=____,
                                    n_resamples=____,
                                    vectorized=____)

# Print the p-value
____(____.pvalue)
Edit and Run Code