Get Started

Counting unique labels

As Peter mentioned in the video, there are over 100 unique labels. In this exercise, you will explore this fact by counting and plotting the number of unique values for each category of label.

The dataframe df and the LABELS list have been loaded into the workspace; the LABELS columns of df have been converted to category types.

pandas, which has been pre-imported as pd, provides a pd.Series.nunique method for counting the number of unique values in a Series.

This is a part of the course

“Case Study: School Budgeting with Machine Learning in Python”

View Course

Exercise instructions

  • Create the DataFrame num_unique_labels by using the .apply() method on df[LABELS] with pd.Series.nunique as the argument.
  • Create a bar plot of num_unique_labels using pandas' .plot(kind='bar') method.
  • The axes have been labeled for you, so hit submit to see the number of unique values for each label.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

# Import matplotlib.pyplot
import matplotlib.pyplot as plt

# Calculate number of unique values for each label: num_unique_labels
num_unique_labels = ____

# Plot number of unique values for each label
____

# Label the axes
plt.xlabel('Labels')
plt.ylabel('Number of unique values')

# Display the plot
plt.show()

This exercise is part of the course

Case Study: School Budgeting with Machine Learning in Python

IntermediateSkill Level
3.7+
7 reviews

Learn how to build a model to automatically classify items in a school budget.

In this chapter, you'll be introduced to the problem you'll be solving in this course. How do you accurately classify line-items in a school budget based on what that money is being used for? You will explore the raw text and numeric values in the dataset, both quantitatively and visually. And you'll learn how to measure success when trying to predict class labels for each row of the dataset.

Exercise 1: Introducing the challengeExercise 2: What category of problem is this?Exercise 3: What is the goal of the algorithm?Exercise 4: Exploring the dataExercise 5: Loading the dataExercise 6: Summarizing the dataExercise 7: Looking at the datatypesExercise 8: Exploring datatypes in pandasExercise 9: Encode the labels as categorical variablesExercise 10: Counting unique labels
Exercise 11: How do we measure success?Exercise 12: Penalizing highly confident wrong answersExercise 13: Computing log loss with NumPy

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free