Masking identifying information with regular expressions
Regular expressions can also be used to replace patterns in strings using REGEXP_REPLACE()
. The function is similar to the REPLACE()
function. Its signature is REGEXP_REPLACE(source, pattern, replace, flags)
.
pattern
is the string pattern to match in thesource
string.replace
is the replacement string to use in place of the pattern.flags
is an optional string used to control matching.
For example, REGEXP_REPLACE(xyz, '\d', '_', 'g')
would replace any digit character (\d
) in the column xyz
with an underscore (_
). The g
("global") flag ensures every match is replaced.
To protect parking violation recipients' privacy in a new web report, all letters in the plate_id
column must be replaced with a dash (-
) to mask the true license plate number.
This is a part of the course
“Cleaning Data in PostgreSQL Databases”
Exercise instructions
- Use
REGEXP_REPLACE()
to replace all uppercase letters (A
toZ
) in theplate_id
column with a dash character (-
) so that masked license plate numbers can be used in the report.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
SELECT
summons_number,
-- Replace uppercase letters in plate_id with dash
___(___, ___, ___, 'g')
FROM
parking_violation;
This exercise is part of the course
Cleaning Data in PostgreSQL Databases
Learn to tame your raw, messy data stored in a PostgreSQL database to extract accurate insights.
In this chapter, you’ll gain an understanding of data cleaning approaches when working with PostgreSQL databases and learn the value of cleaning data as early as possible in the pipeline. You’ll also learn basic string editing approaches such as removing unnecessary spaces as well as more involved topics such as pattern matching and string similarity to identify string values in need of cleaning.
Exercise 1: Introduction to data cleaningExercise 2: Developing a data cleaning mindsetExercise 3: Applying functions for string cleaningExercise 4: Pattern matchingExercise 5: Classifying parking violations by time of dayExercise 6: Masking identifying information with regular expressionsExercise 7: Matching similar stringsExercise 8: Matching inconsistent color namesExercise 9: Standardizing color namesExercise 10: Standardizing multiple colorsExercise 11: Formatting text for colleaguesWhat is DataCamp?
Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.