1. Plotting directly using pandas
Plots can reveal interesting patterns in your data
2. Plotting in Python
Although there are several ways you can plot your data in Python, we will cover plotting using pandas, seaborn, and matplotlib.
3. Pandas plot method
Pandas has a plot() method that can be used to draw several types of plots. The kind argument in the plot() method can help you specify the type of plot. This method works on both the DataFrame and Series objects in Pandas.
Here is a list of plots you can create with pandas. Let's quickly discuss a few of them.
4. Univariate: Histogram
Histograms are used to look at the distribution of a continuous variable. Let's plot a histogram of the sepal_length column from the iris data set.
Since the pandas plot() method is built on top of matplotlib, we will import matplotlib dot pyplot with its usual alias plt
We call the plot() method on iris['sepal_length'] and specify the kind argument as 'hist'. Then we use the show() function to display the plot.
5. Univariate: Bar plot
Bar plots are used to display the counts of categorical data. If you want to use the plot() method directly on a column, you need to first calculate the frequency counts manually using the value_counts() method.
So we count the number of species of each kind in the iris DataFrame and assign the result to cts. You can then call plot() directly on cts and specify the kind argument as 'bar'. Finally, we call plt dot show() to display the plot.
6. Bivariate: Scatter plot
Scatter plots are used to look at the correlation of two continuous variables. Since scatter plots involve more than one variable, you call plot() on the entire DataFrame, instead of an individual column.
As you can see here, we call plot() on iris and specify kind as 'scatter'. In addition, we specify x as sepal_length and y as sepal_width to plot the sepal length on the x-axis and sepal width on the y-axis.
7. Bivariate: Boxplots
You can also use a boxplot to visualize the spread or distribution of a continuous variable.
When you call plot() on a DataFrame and specify the kind argument as 'box', a box plot for each numeric column is plotted.
8. Bivariate: Boxplots
Boxplots can also be used to compare a categorical variable to a continuous variable. Here we plot the boxplots of the sepal_length variable for each category in species by specifying the by argument as species and column as sepal_length.
9. Let's practice!
Now it's your turn to draw plots using Pandas!