Aan de slagGa gratis aan de slag

Analyzing skewed data with a permutation test

Permutation tests can be useful for situations which don't satisfy the conditions of the hypothesis tests you already know. In this exercise you'll code up a permutation test using the statsmodels package.

You're interested in comparing the average number of funding rounds between between companies in the analytics space and all other venture-funded companies. While you may be tempted to use a t-test, you can be sure that the number of funding rounds is not normally distributed. Instead, the majority of companies have only one round, with the number of companies with two or more rounds quickly dropping off.

The following have been loaded for you:

  • analytics_df - Data on all analytics companies
  • non_analytics_df - Data on all other non-analytics companies

Deze oefening maakt deel uit van de cursus

Foundations of Inference in Python

Cursus bekijken

Oefeninstructies

  • Define a statistic function which, given two samples fundings_group_1 and fundings_group_2, returns the difference in mean number of funding_rounds.
  • Conduct a permutation test using the funding_rounds column from each data set, the statistic function you defined, and 100 resamples.
  • Print out the resulting p-value of your permutation test.

Praktische interactieve oefening

Probeer deze oefening eens door deze voorbeeldcode in te vullen.

# Write a "statistic" function which calculates the difference in means
def statistic(funding_group_1, funding_group_2):
  return ____(fundings_group_1) - ____(funding_group_2)

# Conduct a permutation test using 100 resamples
perm_result = stats.permutation_test((____['funding_rounds'], ____['funding_rounds']),
                                    statistic=____,
                                    n_resamples=____,
                                    vectorized=____)

# Print the p-value
____(____.pvalue)
Code bewerken en uitvoeren