Get startedGet started for free

Cleaning strings

1. Cleaning strings

Data types are not the only way we can have messy data. Another potential issue occurs when dealing with strings. Strings are not always input in the most consistent way, which makes reports look sloppy and can even cause issues with groupings.

2. Messy strings

For example, let’s say we have a report with these five values. Notice the different formats for each string. Even though all five values conceptually correspond to the same country, the report will not know that. Ideally, we can come up with some way to automatically group these values together, without having to use a complex CASE statement.

3. String functions

We clean up string values using string functions. There’s a number of string functions at our disposal. In order to figure out what function to use, it is important to identify what action should be taken on the string.

4. String functions

If we need to replace or remove characters in a string, we can use the REPLACE function.

5. Replacing or removing characters

In this example, one value has extra period characters. We want to remove these. The REPLACE function allows us to replace a substring with a new substring. You can remove the substring completely by replacing with empty quotes, as shown. After cleaning the string in this way, the results correctly group these values into one country.

6. String functions

To parse out a portion of the string, we can leverage the LEFT, RIGHT, or SUBSTRING functions.

7. Parsing strings

In this example, one of our values includes extra characters. We only care to keep the first two characters. You can extract the first N characters using the LEFT function. By setting N to 2, we can solve this problem.

8. String functions

To alter CASE of our strings, we can use the UPPER, LOWER, or INITCAP string functions.

9. Changing case

In this example, each value has a different case, one uppercase, and one lowercase. SQL will read these as separate values. We can use the UPPER or LOWER function to keep the case consistent.

10. String functions

Lastly, to remove extra spaces, we can use the TRIM function.

11. Trimming extra spaces

Here, we have extra spaces at the start of the second value. By using a TRIM function, we can remove all leading and trailing spaces. Note that TRIM does keep spaces found within the string.

12. Nesting functions

So there's a few ways you can alter string values. But if we want to fix all these issues in one query, we must include several string functions at once. We do this by nesting functions. Since each string function outputs a new string, you can use that output as the input of the next function. In our example, we need to include all 4 of these functions: REPLACE, TRIM, LEFT, and UPPER.

13. Take it step-by-step

Let's take this step by step. First, we want to replace the string. Next, we want to trim it, so we add trim to the outside of the replace function. We do the same thing to take the first two characters with left. And finally, we change the case with upper. The final query looks a bit messy, but taking it step-by-step makes it easier to set up.

14. Order of nesting matters!

Do note that the order of the nesting matters. In our example, if you used the LEFT function first, it would incorrectly keep extra characters, such as periods or spaces. This would cause our final output to incorrectly keep messy values, as shown here.

15. String function documentation

For a full list of postgreSQL string functions, click on this link.

16. Let's practice!

The techniques explained in this chapter are commonly used in business settings, as strings are often stored inconsistently. It's time to get some hands-on experience cleaning strings.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.