CommencerCommencer gratuitement

Data cleaning and enhancement

TechCorp is migrating to a new HR system. The dataset needs cleaning: remove low-salary outliers that indicate data entry errors, drop columns the new system won't use, and add a calculated bonus field. Data cleaning typically takes 80% of analysis time—these skills are essential.

The Table, Selection, and DoubleColumn classes have been imported for you.

Cet exercice fait partie du cours

Importing Data in Java

Afficher le cours

Instructions

  • Remove employees with salaries below $40,000.
  • Remove the "JobTitle" column.
  • Add the PerformanceBonus column (5% of salary).

Exercice interactif pratique

Essayez cet exercice en complétant cet exemple de code.

public class DataExploration {
	public static void main(String[] args) {

        Table employees = Table.read().csv("employees.csv");

        // Remove employees with salaries below $40,000
        Selection lowSalaries = employees.intColumn("Salary").isLessThan(____);
        Table cleanedEmployees = employees.____(lowSalaries);

        // Remove the JobTitle column
        Table streamlined = cleanedEmployees.____("JobTitle");

        DoubleColumn performanceBonus = streamlined.intColumn("Salary").asDoubleColumn()
            .map(salary -> salary * 0.05);
        performanceBonus.setName("PerformanceBonus");

        // Add the PerformanceBonus column
        Table enhancedEmployees = streamlined.____(performanceBonus);

        System.out.println("Total employees after cleaning: " + enhancedEmployees.rowCount());
        System.out.println("\nFirst 5 rows of enhanced dataset:");
        System.out.println(enhancedEmployees.first(5));
	}
}
Modifier et exécuter le code