ComenzarEmpieza gratis

Data cleaning and enhancement

TechCorp is migrating to a new HR system. The dataset needs cleaning: remove low-salary outliers that indicate data entry errors, drop columns the new system won't use, and add a calculated bonus field. Data cleaning typically takes 80% of analysis time—these skills are essential.

The Table, Selection, and DoubleColumn classes have been imported for you.

Este ejercicio forma parte del curso

Importing Data in Java

Ver curso

Instrucciones del ejercicio

  • Remove employees with salaries below $40,000.
  • Remove the "JobTitle" column.
  • Add the PerformanceBonus column (5% of salary).

Ejercicio interactivo práctico

Prueba este ejercicio y completa el código de muestra.

public class DataExploration {
	public static void main(String[] args) {

        Table employees = Table.read().csv("employees.csv");

        // Remove employees with salaries below $40,000
        Selection lowSalaries = employees.intColumn("Salary").isLessThan(____);
        Table cleanedEmployees = employees.____(lowSalaries);

        // Remove the JobTitle column
        Table streamlined = cleanedEmployees.____("JobTitle");

        DoubleColumn performanceBonus = streamlined.intColumn("Salary").asDoubleColumn()
            .map(salary -> salary * 0.05);
        performanceBonus.setName("PerformanceBonus");

        // Add the PerformanceBonus column
        Table enhancedEmployees = streamlined.____(performanceBonus);

        System.out.println("Total employees after cleaning: " + enhancedEmployees.rowCount());
        System.out.println("\nFirst 5 rows of enhanced dataset:");
        System.out.println(enhancedEmployees.first(5));
	}
}
Editar y ejecutar código