1. Reformatting string and character data
Great job! Understanding how to manipulate and transform date and time data types will be something you use often in your data science work. Next we are going to learn how to manipulate and transform string and character data.
2. Topics
In this chapter we will begin by learning about functions and operators that allow us to reformat string and character data.
Next you'll explore functions that allow you to parse string and character data.
You will then learn how to calculate the length of a string or determine the position of a character within a string.
And finally you'll learn how to truncate and pad string data.
Let's get started.
3. The string concatenation operator
First we will look at one of the most common and frequently used techniques when working with string data. String concatenation allows you to merge two or more strings together to form a single combined string.
In this example, you see how we can combine two separate columns from the customer table, first_name and last_name, to create a new column called full_name.
This is one of many real world scenarios that will require you to concatenate strings.
4. String concatenation with functions
Additionally, PostgreSQL also has a built-in function for string concatenation. The CONCAT() function accepts one or more parameters and returns the concatenated string as the result. Each parameter can be a column from a database or a literal value separate by a comma. In this example, we see how we can perform the same concatenation operation using this function rather than the || operator from the previous slide and it will produce an identical result.
5. String concatenation with a non-string input
PostgreSQL also allows you to concatenate both string and non-string data. As we see in this example, we prepend the customer_id column to the first_name and last_name columns. Non-string data can be used in concatenation with both the || operator as well as the CONCAT() function.
6. Changing the case of string
There will also be times when you want to reformat string data to uppercase, lowercase or title case. This comes in handy when you want to standardize a field in your dataset for manipulation.
The UPPER function allows you to reformat a string so you change every character to its uppercase equivalent. UPPER accepts a string as a parameter and returns that string in all uppercase.
Transforming string data will be useful when normalizing and cleansing datasets.
7. Changing the case of string
The LOWER function is analogous to UPPER but converts the string to lowercase instead. Here you see an example with the title column from the film table.
8. Changing the case of string
Similarly, the INITCAP function will convert a string to title case.
9. Replacing characters in a string
The REPLACE function will find a substring in a string and replace it with a different substring. Look at the results of the following query. You'll notice that the phrase "A Astounding" that is present in the first few rows is grammatically incorrect.
10. Replacing characters in a string
So let's say we want to fix this and replace all occurrences of 'A Astounding' with the proper 'An Astounding' text. We can use the REPLACE function to accomplish this task. The function takes three parameters. The first is the source string that you want to manipulate, the second is the substring you want to find in the source string and the last parameter is the replacement string.
11. Manipulating string data with REVERSE
The REVERSE function does just what you think it does...it accepts a string as its only parameter and returns the same string in reverse order as you see when we use the function to reverse the title column of the film table.
12. Let's practice!
Now it's your turn! Let's get some practice using these functions.