LoslegenKostenlos loslegen

Data cleaning and enhancement

TechCorp is migrating to a new HR system. The dataset needs cleaning: remove low-salary outliers that indicate data entry errors, drop columns the new system won't use, and add a calculated bonus field. Data cleaning typically takes 80% of analysis time—these skills are essential.

The Table, Selection, and DoubleColumn classes have been imported for you.

Diese Übung ist Teil des Kurses

Importing Data in Java

Kurs anzeigen

Anleitung zur Übung

  • Remove employees with salaries below $40,000.
  • Remove the "JobTitle" column.
  • Add the PerformanceBonus column (5% of salary).

Interaktive Übung

Vervollständige den Beispielcode, um diese Übung erfolgreich abzuschließen.

public class DataExploration {
	public static void main(String[] args) {

        Table employees = Table.read().csv("employees.csv");

        // Remove employees with salaries below $40,000
        Selection lowSalaries = employees.intColumn("Salary").isLessThan(____);
        Table cleanedEmployees = employees.____(lowSalaries);

        // Remove the JobTitle column
        Table streamlined = cleanedEmployees.____("JobTitle");

        DoubleColumn performanceBonus = streamlined.intColumn("Salary").asDoubleColumn()
            .map(salary -> salary * 0.05);
        performanceBonus.setName("PerformanceBonus");

        // Add the PerformanceBonus column
        Table enhancedEmployees = streamlined.____(performanceBonus);

        System.out.println("Total employees after cleaning: " + enhancedEmployees.rowCount());
        System.out.println("\nFirst 5 rows of enhanced dataset:");
        System.out.println(enhancedEmployees.first(5));
	}
}
Code bearbeiten und ausführen