Exercise

Analyzing skewed data with a permutation test

Permutation tests can be useful for situations which don't satisfy the conditions of the hypothesis tests you already know. In this exercise you'll code up a permutation test using the statsmodels package.

You're interested in comparing the average number of funding rounds between between companies in the analytics space and all other venture-funded companies. While you may be tempted to use a t-test, you can be sure that the number of funding rounds is not normally distributed. Instead, the majority of companies have only one round, with the number of companies with two or more rounds quickly dropping off.

The following have been loaded for you:

  • analytics_df - Data on all analytics companies
  • non_analytics_df - Data on all other non-analytics companies

Instructions

100 XP
  • Define a statistic function which, given two samples fundings_group_1 and fundings_group_2, returns the difference in mean number of funding_rounds.
  • Conduct a permutation test using the funding_rounds column from each data set, the statistic function you defined, and 100 resamples.
  • Print out the resulting p-value of your permutation test.