Get Started

Splitting and concatenating text

1. Splitting and concatenating text

When working with text values, you often need to break strings apart into multiple pieces, extract part of a string to a new variable, or join, or concatenate, strings together. There are functions to help us with these operations.

2. Substring

First, how do we extract just part of a string? The left and right functions take as arguments a string, or the name of a column of strings, and the number of characters to keep. Left keeps characters starting at the left, while right keeps characters counting from the end. Here, the first two characters in the string abcde are a and b, while the last two characters are d and e. If the string contains fewer than the requested number of characters, only the available characters are returned.

3. Substring

To extract characters from the middle of a string, use the substring function. The function takes a string or column to operate on, and then the keyword FROM. Next comes the index of the character to start with, counting from 1. Then the keyword FOR followed by the number of characters to include in the substring. For example, if we take the substring of abcdef starting from position 2 and going for 3 characters, we get bcd. B was the second character in the string, and the function extracted 3 characters. You may also see an abbreviated version of substring with a shortened function name and comma-separated arguments. It works the same way. The left, right, and substring functions can be useful in situations such as extracting a snippet from a long unstructured text field, displaying just the first or last few digits of an account number, or limiting a zip code to only the first 5 digits.

4. Delimiters

The second string operation to know is how to split a string into parts based on a delimiter. A delimiter is a character, such as a comma, or a string that separates fields or chunks of text.

5. Splitting on a delimiter

The function split_part takes a string, the delimiter to split the string on, and the number position of the part of the split string to return, counting from one. For example, if we split the string a-comma-bc-comma-d with a comma as the delimiter, the string would be split into 3 parts: a, bc, and d. If we ask for the second part, we get bc. Note that the delimiter is not included in the returned value.

6. Splitting on a delimiter

The delimiter can be a single character or a string of multiple characters. For example, if we split the string "cats and dogs and fish" on "and" surrounded by spaces, the first group is cats. Note that the string was split on the delimiter exactly as it appears, not on the set of characters included in the delimiter. It is common to split strings on a delimiter value when multiple pieces of information have been stored together in a single column.

7. Concatenating text

The third string operation is concatenation. The concat function takes any number of arguments. It joins the text representation of all of the values together in a single string. You can concatenate both character types and non-character types. Values can also be concatenated with a double pipe, which looks like two vertical bars. This operator is the SQL standard for string concatenation. It works the same as the concat function except when null values are included. The concat function omits null values, while the double pipe will return null if any component is null. One example of when you might concatenate strings is to join a first name and last name stored in separate columns to get a person's full name.

8. Manipulate some strings!

Alright, time to practice manipulating strings with the functions you've learned.