Cleaning strings
1. Cleaning strings
Data types are not the only way we can have messy data. Another potential issue occurs when dealing with strings. Strings are not always input in the most consistent way, which makes reports look sloppy and can even cause issues with groupings.2. Messy strings
For example, let’s say we have a report with these five values. Notice the different formats for each string. Even though all five values conceptually correspond to the same country, the report will not know that. Ideally, we can come up with some way to automatically group these values together, without having to use a complex CASE statement.3. String functions
We clean up string values using string functions. There’s a number of string functions at our disposal. In order to figure out what function to use, it is important to identify what action should be taken on the string.4. String functions
If we need to replace or remove characters in a string, we can use the REPLACE function.5. Replacing or removing characters
In this example, one value has extra period characters. We want to remove these. The REPLACE function allows us to replace a substring with a new substring. You can remove the substring completely by replacing with empty quotes, as shown. After cleaning the string in this way, the results correctly group these values into one country.6. String functions
To parse out a portion of the string, we can leverage the LEFT, RIGHT, or SUBSTRING functions.7. Parsing strings
In this example, one of our values includes extra characters. We only care to keep the first two characters. You can extract the first N characters using the LEFT function. By setting N to 2, we can solve this problem.8. String functions
To alter CASE of our strings, we can use the UPPER, LOWER, or INITCAP string functions.9. Changing case
In this example, each value has a different case, one uppercase, and one lowercase. SQL will read these as separate values. We can use the UPPER or LOWER function to keep the case consistent.10. String functions
Lastly, to remove extra spaces, we can use the TRIM function.11. Trimming extra spaces
Here, we have extra spaces at the start of the second value. By using a TRIM function, we can remove all leading and trailing spaces. Note that TRIM does keep spaces found within the string.12. Nesting functions
So there's a few ways you can alter string values. But if we want to fix all these issues in one query, we must include several string functions at once. We do this by nesting functions. Since each string function outputs a new string, you can use that output as the input of the next function. In our example, we need to include all 4 of these functions: REPLACE, TRIM, LEFT, and UPPER.13. Take it step-by-step
Let's take this step by step. First, we want to replace the string. Next, we want to trim it, so we add trim to the outside of the replace function. We do the same thing to take the first two characters with left. And finally, we change the case with upper. The final query looks a bit messy, but taking it step-by-step makes it easier to set up.14. Order of nesting matters!
Do note that the order of the nesting matters. In our example, if you used the LEFT function first, it would incorrectly keep extra characters, such as periods or spaces. This would cause our final output to incorrectly keep messy values, as shown here.15. String function documentation
For a full list of postgreSQL string functions, click on this link.16. Let's practice!
The techniques explained in this chapter are commonly used in business settings, as strings are often stored inconsistently. It's time to get some hands-on experience cleaning strings.Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.