Resolving impartial duplicates
The parking_violation
dataset has been modified to include a fee
column indicating the fee for the violation. This column would be useful for keeping track of New York City parking ticket revenue. However, due to duplicated violation records, revenue calculations based on the dataset would not be accurate. These duplicate records only differ based on the value in the fee
column. All other column values are shared in the duplicated records. A decision has been made to use the minimum fee
to resolve the ambiguity created by these duplicates.
Identify the 3 duplicated parking_violation
records and use the MIN()
function to determine the fee
that will be used after removing the duplicate records.
This exercise is part of the course
Cleaning Data in PostgreSQL Databases
Exercise instructions
- Return the
summons_number
and the minimumfee
for duplicated records. - Group the results by
summons_number
. - Restrict the results to records having a
summons_number
count that is greater than 1.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
SELECT
-- Include SELECT list columns
___,
___(___) AS fee
FROM
parking_violation
GROUP BY
-- Define column for GROUP BY
___
HAVING
-- Restrict to summons numbers with count greater than 1
___(___) ___ ___;