Get Started

Masking identifying information with regular expressions

Regular expressions can also be used to replace patterns in strings using REGEXP_REPLACE(). The function is similar to the REPLACE() function. Its signature is REGEXP_REPLACE(source, pattern, replace, flags).

  • pattern is the string pattern to match in the source string.
  • replace is the replacement string to use in place of the pattern.
  • flags is an optional string used to control matching.

For example, REGEXP_REPLACE(xyz, '\d', '_', 'g') would replace any digit character (\d) in the column xyz with an underscore (_). The g ("global") flag ensures every match is replaced.

To protect parking violation recipients' privacy in a new web report, all letters in the plate_id column must be replaced with a dash (-) to mask the true license plate number.

This is a part of the course

“Cleaning Data in PostgreSQL Databases”

View Course

Exercise instructions

  • Use REGEXP_REPLACE() to replace all uppercase letters (A to Z) in the plate_id column with a dash character (-) so that masked license plate numbers can be used in the report.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

SELECT 
	summons_number,
	-- Replace uppercase letters in plate_id with dash
	___(___, ___, ___, 'g') 
FROM 
	parking_violation;
Edit and Run Code