Get startedGet started for free

Data cleaning and enhancement

TechCorp is migrating to a new HR system. The dataset needs cleaning: remove low-salary outliers that indicate data entry errors, drop columns the new system won't use, and add a calculated bonus field. Data cleaning typically takes 80% of analysis time—these skills are essential.

The Table, Selection, and DoubleColumn classes have been imported for you.

This exercise is part of the course

Importing Data in Java

View Course

Exercise instructions

  • Remove employees with salaries below $40,000.
  • Remove the "JobTitle" column.
  • Add the PerformanceBonus column (5% of salary).

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

public class DataExploration {
	public static void main(String[] args) {

        Table employees = Table.read().csv("employees.csv");

        // Remove employees with salaries below $40,000
        Selection lowSalaries = employees.intColumn("Salary").isLessThan(____);
        Table cleanedEmployees = employees.____(lowSalaries);

        // Remove the JobTitle column
        Table streamlined = cleanedEmployees.____("JobTitle");

        DoubleColumn performanceBonus = streamlined.intColumn("Salary").asDoubleColumn()
            .map(salary -> salary * 0.05);
        performanceBonus.setName("PerformanceBonus");

        // Add the PerformanceBonus column
        Table enhancedEmployees = streamlined.____(performanceBonus);

        System.out.println("Total employees after cleaning: " + enhancedEmployees.rowCount());
        System.out.println("\nFirst 5 rows of enhanced dataset:");
        System.out.println(enhancedEmployees.first(5));
	}
}
Edit and Run Code