Computing the covariance
The covariance may be computed using the Numpy function np.cov()
. For example, we have two sets of data x
and y
, np.cov(x, y)
returns a 2D array where entries [0,1]
and [1,0]
are the covariances. Entry [0,0]
is the variance of the data in x
, and entry [1,1]
is the variance of the data in y
. This 2D output array is called the covariance matrix, since it organizes the self- and covariance.
To remind you how the I. versicolor petal length and width are related, we include the scatter plot you generated in a previous exercise.
This is a part of the course
“Statistical Thinking in Python (Part 1)”
Exercise instructions
- Use
np.cov()
to compute the covariance matrix for the petal length (versicolor_petal_length
) and width (versicolor_petal_width
) of I. versicolor. - Print the covariance matrix.
- Extract the covariance from entry
[0,1]
of the covariance matrix. Note that by symmetry, entry[1,0]
is the same as entry[0,1]
. - Print the covariance.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Compute the covariance matrix: covariance_matrix
# Print covariance matrix
# Extract covariance of length and width of petals: petal_cov
# Print the length/width covariance
This exercise is part of the course
Statistical Thinking in Python (Part 1)
Build the foundation you need to think statistically and to speak the language of your data.
Chapter 1: Graphical Exploratory Data Analysis
Before diving into sophisticated statistical inference techniques, you should first explore your data by plotting them and computing simple summary statistics. This process, called exploratory data analysis, is a crucial first step in statistical analysis of data.
Exercise 1: Introduction to Exploratory Data AnalysisExercise 2: What is the goal of statistical inference?Exercise 3: Advantages of graphical EDAExercise 4: Plotting a histogramExercise 5: Plotting a histogram of iris dataExercise 6: Axis labels!Exercise 7: Adjusting the number of bins in a histogramExercise 8: Plot all of your data: Bee swarm plotsExercise 9: Bee swarm plotExercise 10: Interpreting a bee swarm plotExercise 11: Plot all of your data: ECDFsExercise 12: Computing the ECDFExercise 13: Plotting the ECDFExercise 14: Comparison of ECDFsExercise 15: Onward toward the whole story!Chapter 2: Quantitative Exploratory Data Analysis
In this chapter, you will compute useful summary statistics, which serve to concisely describe salient features of a dataset with a few numbers.
Exercise 1: Introduction to summary statistics: The sample mean and medianExercise 2: Means and mediansExercise 3: Computing meansExercise 4: Percentiles, outliers, and box plotsExercise 5: Computing percentilesExercise 6: Comparing percentiles to ECDFExercise 7: Box-and-whisker plotExercise 8: Variance and standard deviationExercise 9: Computing the varianceExercise 10: The standard deviation and the varianceExercise 11: Covariance and the Pearson correlation coefficientExercise 12: Scatter plotsExercise 13: Variance and covariance by lookingExercise 14: Computing the covarianceExercise 15: Computing the Pearson correlation coefficientChapter 3: Thinking Probabilistically-- Discrete Variables
Statistical inference rests upon probability. Because we can very rarely say anything meaningful with absolute certainty from data, we use probabilistic language to make quantitative statements about data. In this chapter, you will learn how to think probabilistically about discrete quantities: those that can only take certain values, like integers.
Exercise 1: Probabilistic logic and statistical inferenceExercise 2: What is the goal of statistical inference?Exercise 3: Why do we use the language of probability?Exercise 4: Random number generators and hacker statisticsExercise 5: Generating random numbers using the np.random moduleExercise 6: The np.random module and Bernoulli trialsExercise 7: How many defaults might we expect?Exercise 8: Will the bank fail?Exercise 9: Probability distributions and stories: The Binomial distributionExercise 10: Sampling out of the Binomial distributionExercise 11: Plotting the Binomial PMFExercise 12: Poisson processes and the Poisson distributionExercise 13: Relationship between Binomial and Poisson distributionsExercise 14: How many no-hitters in a season?Exercise 15: Was 2015 anomalous?Chapter 4: Thinking Probabilistically-- Continuous Variables
It’s time to move onto continuous variables, such as those that can take on any fractional value. Many of the principles are the same, but there are some subtleties. At the end of this final chapter, you will be speaking the probabilistic language you need to launch into the inference techniques covered in the sequel to this course.
Exercise 1: Probability density functionsExercise 2: Interpreting PDFsExercise 3: Interpreting CDFsExercise 4: Introduction to the Normal distributionExercise 5: The Normal PDFExercise 6: The Normal CDFExercise 7: The Normal distribution: Properties and warningsExercise 8: Gauss and the 10 Deutschmark banknoteExercise 9: Are the Belmont Stakes results Normally distributed?Exercise 10: What are the chances of a horse matching or beating Secretariat's record?Exercise 11: The Exponential distributionExercise 12: Matching a story and a distributionExercise 13: Waiting for the next SecretariatExercise 14: If you have a story, you can simulate it!Exercise 15: Distribution of no-hitters and cyclesExercise 16: Final thoughtsWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.