Checking for class imbalance
The 2022 Kaggle Survey captures information about data scientists' backgrounds, preferred technologies, and techniques. It is seen as an accurate view of what is happening in data science based on the volume and profile of responders.
Having looked at the job titles and categorized to align with our salaries
DataFrame, you can see the following proportion of job categories in the Kaggle survey:
Job Category | Relative Frequency |
---|---|
Data Science | 0.281236 |
Data Analytics | 0.224231 |
Other | 0.214609 |
Managerial | 0.121300 |
Machine Learning | 0.083248 |
Data Engineering | 0.075375 |
Thinking of the Kaggle survey results as the population, your task is to find out whether the salaries
DataFrame is representative by comparing the relative frequency of job categories.
This exercise is part of the course
Exploratory Data Analysis in Python
Exercise instructions
- Print the relative frequency of the
"Job_Category"
column fromsalaries
DataFrame.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Print the relative frequency of Job_Category
print(____)