Hypothesentest zur Pearson-Korrelation

Die beobachtete Korrelation zwischen weiblichem Analphabetismus und Fertilität könnte reiner Zufall sein; die Fertilität eines Landes ist möglicherweise völlig unabhängig vom Analphabetismus. Genau diese Hypothese testest du jetzt. Permutiere dazu die Werte von illiteracy, lasse die Werte von fertility aber unverändert. So simulierst du die Hypothese, dass beide Größen vollkommen unabhängig voneinander sind. Berechne für jede Permutation den Pearson-Korrelationskoeffizienten und prüfe, wie viele deiner Permutationsreplikate einen Pearson-Korrelationskoeffizienten haben, der größer ist als der beobachtete.

Die Funktion pearson_r(), die du im ersten Teil dieses Kurses geschrieben hast, um den Pearson-Korrelationskoeffizienten zu berechnen, steht dir bereits zur Verfügung.

Diese Übung ist Teil des Kurses

<Kurs>Statistical Thinking in Python (Teil 2)</Kurs>

Übungsanweisungen

Berechne die beobachtete Pearson-Korrelation zwischen illiteracy und fertility.
Initialisiere ein Array, um deine Permutationsreplikate zu speichern.
Schreibe eine for-Schleife, um 10.000 Replikate zu erzeugen:
- Permutiere die Messwerte in illiteracy mit np.random.permutation().
- Berechne die Pearson-Korrelation zwischen dem permutierten Array illiteracy_permuted und fertility.
Berechne den p-Wert aus den Replikaten und gib ihn aus.

Interaktive praktische Übung

Versuche dich an dieser Übung, indem du diesen Beispielcode vervollständigst.

# Compute observed correlation: r_obs
r_obs = ____

# Initialize permutation replicates: perm_replicates
perm_replicates = np.empty(10000)

# Draw replicates
for ____ in ____:
    # Permute illiteracy measurments: illiteracy_permuted
    illiteracy_permuted = ____

    # Compute Pearson correlation
    perm_replicates[i] = ____

# Compute p-value: p
p = ____
print('p-val =', p)

Code bearbeiten und ausführen

Diese Übung ist Teil des Kurses

<Kurs>Statistical Thinking in Python (Teil 2)</Kurs>

Mittlere SchwierigkeitSchwierigkeitsgrad

4.8+

Kurs kostenlos starten

When doing statistical inference, we speak the language of probability. A probability distribution that describes your data has parameters. So, a major goal of statistical inference is to estimate the values of these parameters, which allows us to concisely and unambiguously describe our data and draw conclusions from it. In this chapter, you will learn how to find the optimal parameters, those that best describe your data.

Exercise 1: Optimal parameters Exercise 2: How often do we get no-hitters?Exercise 3: Do the data follow our story?Exercise 4: How is this parameter optimal?Exercise 5: Linear regression by least squares Exercise 6: EDA of literacy/fertility data Exercise 7: Linear regression Exercise 8: How is it optimal?Exercise 9: The importance of EDA: Anscombe's quartet Exercise 10: The importance of EDA Exercise 11: Linear regression on appropriate Anscombe data Exercise 12: Linear regression on all Anscombe data

To "pull yourself up by your bootstraps" is a classic idiom meaning that you achieve a difficult task by yourself with no help at all. In statistical inference, you want to know what would happen if you could repeat your data acquisition an infinite number of times. This task is impossible, but can we use only the data we actually have to get close to the same result as an infinitude of experiments? The answer is yes! The technique to do it is aptly called bootstrapping. This chapter will introduce you to this extraordinarily powerful tool.

Exercise 1: Generating bootstrap replicates Exercise 2: Getting the terminology down Exercise 3: Bootstrapping by hand Exercise 4: Visualizing bootstrap samples Exercise 5: Bootstrap confidence intervals Exercise 6: Generating many bootstrap replicates Exercise 7: Bootstrap replicates of the mean and the SEM Exercise 8: Confidence intervals of rainfall data Exercise 9: Bootstrap replicates of other statistics Exercise 10: Confidence interval on the rate of no-hitters Exercise 11: Pairs bootstrap Exercise 12: A function to do pairs bootstrap Exercise 13: Pairs bootstrap of literacy/fertility data Exercise 14: Plotting bootstrap regressions

You now know how to define and estimate parameters given a model. But the question remains: how reasonable is it to observe your data if a model is true? This question is addressed by hypothesis tests. They are the icing on the inference cake. After completing this chapter, you will be able to carefully construct and test hypotheses using hacker statistics.

Exercise 1: Formulating and simulating a hypothesis Exercise 2: Generating a permutation sample Exercise 3: Visualizing permutation sampling Exercise 4: Test statistics and p-values Exercise 5: Test statistics Exercise 6: What is a p-value?Exercise 7: Generating permutation replicates Exercise 8: Look before you leap: EDA before hypothesis testing Exercise 9: Permutation test on frog data Exercise 10: Bootstrap hypothesis tests Exercise 11: A one-sample bootstrap hypothesis test Exercise 12: A two-sample bootstrap hypothesis test for difference of means

As you saw from the last chapter, hypothesis testing can be a bit tricky. You need to define the null hypothesis, figure out how to simulate it, and define clearly what it means to be "more extreme" in order to compute the p-value. Like any skill, practice makes perfect, and this chapter gives you some good practice with hypothesis tests.

Exercise 1: A/B-Tests Exercise 2: Die Abstimmung über den Civil Rights Act von 1964 Exercise 3: Was ist äquivalent?Exercise 4: Ein Analogon zur Verweildauer auf der Website Exercise 5: Was hättest du zuerst tun sollen?Exercise 6: Test auf Korrelation Exercise 7: Eine Nullhypothese zur Korrelation simulieren Exercise 8: Hypothesentest zur Pearson-Korrelation

Aktuelle Übung

Exercise 9: Haben Neonicotinoid-Insektizide unbeabsichtigte Folgen?Exercise 10: Bootstrap-Hypothesentest zu Spermienzahlen bei Bienen

Every year for the past 40-plus years, Peter and Rosemary Grant have gone to the Galápagos island of Daphne Major and collected data on Darwin's finches. Using your skills in statistical inference, you will spend this chapter with their data, and witness first hand, through data, evolution in action. It's an exhilarating way to end the course!

Exercise 1: Finch beaks and the need for statistics Exercise 2: EDA of beak depths of Darwin's finches Exercise 3: ECDFs of beak depths Exercise 4: Parameter estimates of beak depths Exercise 5: Hypothesis test: Are beaks deeper in 2012?Exercise 6: Variation in beak shapes Exercise 7: EDA of beak length and depth Exercise 8: Linear regressions Exercise 9: Displaying the linear regression results Exercise 10: Beak length to depth ratio Exercise 11: How different is the ratio?Exercise 12: Calculation of heritability Exercise 13: EDA of heritability Exercise 14: Correlation of offspring and parental data Exercise 15: Pearson correlation of offspring and parental data Exercise 16: Measuring heritability Exercise 17: Is beak depth heritable at all in G. scandens?Exercise 18: Final thoughts