Counting unique labels
As Peter mentioned in the video, there are over 100 unique labels. In this exercise, you will explore this fact by counting and plotting the number of unique values for each category of label.
The dataframe df
and the LABELS
list have been loaded into the workspace; the LABELS
columns of df
have been converted to category types.
pandas, which has been pre-imported as pd
, provides a pd.Series.nunique
method for counting the number of unique values in a Series.
This is a part of the course
“Case Study: School Budgeting with Machine Learning in Python”
Exercise instructions
- Create the DataFrame
num_unique_labels
by using the.apply()
method ondf[LABELS]
withpd.Series.nunique
as the argument. - Create a bar plot of
num_unique_labels
using pandas'.plot(kind='bar')
method. - The axes have been labeled for you, so hit submit to see the number of unique values for each label.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Import matplotlib.pyplot
import matplotlib.pyplot as plt
# Calculate number of unique values for each label: num_unique_labels
num_unique_labels = ____
# Plot number of unique values for each label
____
# Label the axes
plt.xlabel('Labels')
plt.ylabel('Number of unique values')
# Display the plot
plt.show()
This exercise is part of the course
Case Study: School Budgeting with Machine Learning in Python
Learn how to build a model to automatically classify items in a school budget.
In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.
Exercise 1: Introducing the challengeExercise 2: What category of problem is this?Exercise 3: What is the goal of the algorithm?Exercise 4: Exploring the dataExercise 5: Loading the dataExercise 6: Summarizing the dataExercise 7: Looking at the datatypesExercise 8: Exploring datatypes in pandasExercise 9: Encode the labels as categorical variablesExercise 10: Counting unique labelsExercise 11: How do we measure success?Exercise 12: Penalizing highly confident wrong answersExercise 13: Computing log loss with NumPyWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.