Get Started

Masking identifying information with regular expressions

Regular expressions can also be used to replace patterns in strings using REGEXP_REPLACE(). The function is similar to the REPLACE() function. Its signature is REGEXP_REPLACE(source, pattern, replace, flags).

  • pattern is the string pattern to match in the source string.
  • replace is the replacement string to use in place of the pattern.
  • flags is an optional string used to control matching.

For example, REGEXP_REPLACE(xyz, '\d', '_', 'g') would replace any digit character (\d) in the column xyz with an underscore (_). The g ("global") flag ensures every match is replaced.

To protect parking violation recipients' privacy in a new web report, all letters in the plate_id column must be replaced with a dash (-) to mask the true license plate number.

This is a part of the course

“Cleaning Data in PostgreSQL Databases”

View Course

Exercise instructions

  • Use REGEXP_REPLACE() to replace all uppercase letters (A to Z) in the plate_id column with a dash character (-) so that masked license plate numbers can be used in the report.

Hands-on interactive exercise

Have a go at this exercise by completing this sample code.

SELECT 
	summons_number,
	-- Replace uppercase letters in plate_id with dash
	___(___, ___, ___, 'g') 
FROM 
	parking_violation;
Edit and Run Code

This exercise is part of the course

Cleaning Data in PostgreSQL Databases

IntermediateSkill Level
4.4+
17 reviews

Learn to tame your raw, messy data stored in a PostgreSQL database to extract accurate insights.

In this chapter, you’ll gain an understanding of data cleaning approaches when working with PostgreSQL databases and learn the value of cleaning data as early as possible in the pipeline. You’ll also learn basic string editing approaches such as removing unnecessary spaces as well as more involved topics such as pattern matching and string similarity to identify string values in need of cleaning.

Exercise 1: Introduction to data cleaningExercise 2: Developing a data cleaning mindsetExercise 3: Applying functions for string cleaningExercise 4: Pattern matchingExercise 5: Classifying parking violations by time of dayExercise 6: Masking identifying information with regular expressions
Exercise 7: Matching similar stringsExercise 8: Matching inconsistent color namesExercise 9: Standardizing color namesExercise 10: Standardizing multiple colorsExercise 11: Formatting text for colleagues

What is DataCamp?

Learn the data skills you need online at your own pace—from non-coding essentials to data science and machine learning.

Start Learning for Free