Applying functions for string cleaning
Throughout this course, we will be using a dataset with 5000 New York City parking violation records stored in the parking_violation
table.
A service to provide parking violation recipients with a hard copy of the violation is being re-designed. For proper formatting of the output of the information on the report, some fields needs to be changed from the database representation. The changes are as follows:
- For proper text alignment on the form,
violation_location
values must be 4 characters in length. - All
P-U
(pick-up truck) values in thevehicle_body_type
column should use a generalTRK
value. - Only the first letter in each word in the
street_name
column should be capitalized.
The LPAD()
, REPLACE()
, and INITCAP()
functions will be used to effect these changes.
This exercise is part of the course
Cleaning Data in PostgreSQL Databases
Exercise instructions
- Add
'0'
to the beginning of anyviolation_location
that is less than 4 digits in length using theLPAD()
function. - Replace
'P-U'
with'TRK'
in values within thevehicle_body_type
column using theREPLACE()
function. - Ensure that only the first letter of words in the
street_name
column are capitalized using theINITCAP()
function.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
SELECT
-- Add 0s to ensure violation_location is 4 characters in length
___(___, ___, ___) AS violation_location,
-- Replace 'P-U' with 'TRK' in vehicle_body_type column
___(___, ___, ___) AS vehicle_body_type,
-- Ensure only first letter capitalized in street_name
___(___) AS ___
FROM
parking_violation;