Get Started

Computing the Pearson correlation coefficient

As mentioned in the video, the Pearson correlation coefficient, also called the Pearson r, is often easier to interpret than the covariance. It is computed using the np.corrcoef() function. Like np.cov(), it takes two arrays as arguments and returns a 2D array. Entries [0,0] and [1,1] are necessarily equal to 1 (can you think about why?), and the value we are after is entry [0,1].

In this exercise, you will write a function, pearson_r(x, y) that takes in two arrays and returns the Pearson correlation coefficient. You will then use this function to compute it for the petal lengths and widths of I. versicolor.

Again, we include the scatter plot you generated in a previous exercise to remind you how the petal width and length are related.

This is a part of the course

“Statistical Thinking in Python (Part 1)”

View Course

Exercise instructions

  • Define a function with signature pearson_r(x, y).
    • Use np.corrcoef() to compute the correlation matrix of x and y (pass them to np.corrcoef() in that order).
    • The function returns entry [0,1] of the correlation matrix.
  • Compute the Pearson correlation between the data in the arrays versicolor_petal_length and versicolor_petal_width. Assign the result to r.
  • Print the result.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

def ____(____, ____):
    """Compute Pearson correlation coefficient between two arrays."""
    # Compute correlation matrix: corr_mat


    # Return entry [0,1]
    return corr_mat[0,1]

# Compute Pearson correlation coefficient for I. versicolor: r


# Print the result
Edit and Run Code

This exercise is part of the course

Statistical Thinking in Python (Part 1)

IntermediateSkill Level
4.6+
30 reviews

Build the foundation you need to think statistically and to speak the language of your data.

In this chapter, you will compute useful summary statistics, which serve to concisely describe salient features of a dataset with a few numbers.

Exercise 1: Introduction to summary statistics: The sample mean and medianExercise 2: Means and mediansExercise 3: Computing meansExercise 4: Percentiles, outliers, and box plotsExercise 5: Computing percentilesExercise 6: Comparing percentiles to ECDFExercise 7: Box-and-whisker plotExercise 8: Variance and standard deviationExercise 9: Computing the varianceExercise 10: The standard deviation and the varianceExercise 11: Covariance and the Pearson correlation coefficientExercise 12: Scatter plotsExercise 13: Variance and covariance by lookingExercise 14: Computing the covarianceExercise 15: Computing the Pearson correlation coefficient

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free