In this chapter, you’ll gain an understanding of data cleaning approaches when working with PostgreSQL databases and learn the value of cleaning data as early as possible in the pipeline. You’ll also learn basic string editing approaches such as removing unnecessary spaces as well as more involved topics such as pattern matching and string similarity to identify string values in need of cleaning.
You’ll learn how to write queries to solve common problems of missing, duplicate, and invalid data in the context of PostgreSQL database tables. Through hands-on exercises, you’ll use the COALESCE() function, SELECT query, and WHERE clause to clean messy data.
Sometimes you need to convert data stored in a PostgreSQL database from one data type to another. In this chapter, you’ll explore the expressions you need to convert text to numeric types and how to format strings for temporal data.
In the final chapter, you’ll learn how to transform your data and construct pivot tables. Working with real-world postal data, you’ll discover how to combine and split addresses into city, state, and zip codes using a multitude of powerful functions including CONCAT(), SUBSTRING(), and REGEXP_SPLIT_TO_TABLE().