Masking identifying information with regular expressions
Regular expressions can also be used to replace patterns in strings using REGEXP_REPLACE()
. The function is similar to the REPLACE()
function. Its signature is REGEXP_REPLACE(source, pattern, replace, flags)
.
pattern
is the string pattern to match in thesource
string.replace
is the replacement string to use in place of the pattern.flags
is an optional string used to control matching.
For example, REGEXP_REPLACE(xyz, '\d', '_', 'g')
would replace any digit character (\d
) in the column xyz
with an underscore (_
). The g
("global") flag ensures every match is replaced.
To protect parking violation recipients' privacy in a new web report, all letters in the plate_id
column must be replaced with a dash (-
) to mask the true license plate number.
This is a part of the course
“Cleaning Data in PostgreSQL Databases”
Exercise instructions
- Use
REGEXP_REPLACE()
to replace all uppercase letters (A
toZ
) in theplate_id
column with a dash character (-
) so that masked license plate numbers can be used in the report.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
SELECT
summons_number,
-- Replace uppercase letters in plate_id with dash
___(___, ___, ___, 'g')
FROM
parking_violation;