Cleaning messy strings
1. Cleaning messy strings
Welcome back. In this lesson, you will learn how to clean messy strings that don't have the format that we want.2. Removing additional spaces
Let's analyze the content of the carriers table mentioned in the previous lesson. Some words that are stored in the name column have extra leading and trailing spaces. Let's see how to remove these additional spaces.3. Removing additional spaces - TRIM
To remove leading and trailing spaces we can use the TRIM function, available since SQL Server 2017. TRIM removes any specified character from the start and end of a string. If we don't specify any character, TRIM will remove the space character. For example, if we execute this code, where this carrier name has leading and trailing spaces, we will get the following output, that is, without leading and trailing spaces.4. Removing additional spaces - RTRIM and LTRIM
In case you are working with older versions than SQL Server 2017, the TRIM function won't be available. In such a case, you can substitute it by using RTRIM and LTRIM. RTRIM removes all trailing spaces, whereas LTRIM removes all leading spaces. If we combine LTRIM with RTRIM, we will get the same output as we got using TRIM.5. Removing additional spaces
Finally, if we want every value from the name column to be without these extra spaces, we will have to include it within the previous functions.6. Unifying strings
Let's see another example of messy strings. If we analyze the content of the airports table, we can see that in the airport_state column, we have different values for the state of Florida. All of these values are valid, but it is not reasonable to have different values for the same state. Let's see how to unify these different strings.7. Unifying strings - REPLACE
If we decide to unify all these different strings to "Florida", we can use the REPLACE function. REPLACE replaces all occurrences of a specified string with another string. REPLACE performs comparisons based on the collation of the input. If we want to perform a comparison in a specific collation, we can specify it using COLLATE. Collation is outside the scope of this course, but we will suppose that the collation of our inputs is case insensitive.8. Unifying strings - REPLACE
The first step we can take is replacing the string "FL" with the string "Florida". If we execute this code, we can see that the string "Florida" correctly replaced the strings "fl" in lower case letters and "FL" in capital letters. But, what happened with the last value? The letters "Fl" in the string "Florida" were also replaced by the word "Florida" itself, so we got "Floridaorida" as a result. Let's fix it.9. Unifying strings - REPLACE
One easy solution is to concatenate two REPLACE functions. In the innermost REPLACE, we can do the same as in the previous slide, that is, replace the string "FL" with the string "Florida". Then, we can use the REPLACE function again and apply it to this result, replacing the string "Floridaorida" with the string "Florida".10. Unifying strings - REPLACE + CASE
Another solution can be applying the CASE statement. We can order to replace the values of the column airport_state when these values are different than "Florida". In other cases, we can leave the string as it is.11. Unifying strings - REPLACE + UPPER
Finally, let's see how to solve this problem if we had chosen to unify these different strings into the "FL" string. The first step we can take is replacing the string "Florida" with the string "FL", but this won't be enough because we still have the string "fl" in small letters.12. Unifying strings - REPLACE + UPPER
To solve it, we can use the UPPER function. UPPER converts a given string to uppercase. In the following code, we can see that the UPPER function wraps the result of the REPLACE function.13. Unifying strings - REPLACE + UPPER
This results in the following output.14. Let's practice!
Let's have fun cleaning messy strings!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.