Plotting a histogram of iris data
For the exercises in this section, you will use a classic data set collected by botanist Edward Anderson and made famous by Ronald Fisher, one of the most prolific statisticians in history. Anderson carefully measured the anatomical properties of samples of three different species of iris, Iris setosa, Iris versicolor, and Iris virginica. The full data set is available as part of scikit-learn. Here, you will work with his measurements of petal length.
Plot a histogram of the petal lengths of his 50 samples of Iris versicolor using matplotlib/seaborn's default settings. Recall that to specify the default seaborn style, you can use sns.set()
, where sns
is the alias that seaborn
is imported as.
The subset of the data set containing the Iris versicolor petal lengths in units of centimeters (cm) is stored in the NumPy array versicolor_petal_length
.
In the video, Justin plotted the histograms by using the pandas
library and indexing the DataFrame to extract the desired column. Here, however, you only need to use the provided NumPy array. Also, Justin assigned his plotting statements (except for plt.show()
) to the dummy variable _
. This is to prevent unnecessary output from being displayed. It is not required for your solutions to these exercises, however it is good practice to use it. Alternatively, if you are working in an interactive environment such as a Jupyter notebook, you could use a ;
after your plotting statements to achieve the same effect. Justin prefers using _
. Therefore, you will see it used in the solution code.
This is a part of the course
“Statistical Thinking in Python (Part 1)”
Exercise instructions
- Import
matplotlib.pyplot
andseaborn
as their usual aliases (plt
andsns
). - Use
seaborn
to set the plotting defaults. - Plot a histogram of the Iris versicolor petal lengths using
plt.hist()
and the provided NumPy arrayversicolor_petal_length
. - Show the histogram using
plt.show()
.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import plotting modules
# Set default Seaborn style
# Plot histogram of versicolor petal lengths
# Show histogram
This exercise is part of the course
Statistical Thinking in Python (Part 1)
Build the foundation you need to think statistically and to speak the language of your data.
Before diving into sophisticated statistical inference techniques, you should first explore your data by plotting them and computing simple summary statistics. This process, called exploratory data analysis, is a crucial first step in statistical analysis of data.
Exercise 1: Introduction to Exploratory Data AnalysisExercise 2: What is the goal of statistical inference?Exercise 3: Advantages of graphical EDAExercise 4: Plotting a histogramExercise 5: Plotting a histogram of iris dataExercise 6: Axis labels!Exercise 7: Adjusting the number of bins in a histogramExercise 8: Plot all of your data: Bee swarm plotsExercise 9: Bee swarm plotExercise 10: Interpreting a bee swarm plotExercise 11: Plot all of your data: ECDFsExercise 12: Computing the ECDFExercise 13: Plotting the ECDFExercise 14: Comparison of ECDFsExercise 15: Onward toward the whole story!What is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.