Get startedGet started for free

Splitting column data

1. Splitting column data

Let's now consider the reverse operation of concatenation: splitting data into multiple columns.

2. Splitting columns

Imagine a process that extracts restaurant inspection data directly from a web page. In such a scenario, multiple pieces of information may be combined into a single column. Here the dataset includes a column named "violation" which combines a violation code and violation description into a single string value. The violation code begins the violation string. A space character separates the violation description from this value. How might we separate the violation code and description?

3. Finding substring starting position with STRPOS()

Solving this problem involves combining a few different PostgreSQL functions. The first function that will prove useful in this effort is STRPOS(). The STRPOS() function returns the INTEGER representing the first position containing search_string in source_string. As an example, the first space string in the violation shown here is in position 4. The STRPOS() function can provide this value.

4. Finding substring starting position with STRPOS()

STRPOS() returns 0 when source_string does not contain search_string. This example demonstrates that 0 is returned by STRPOS() when the string does not contain a question mark.

5. Finding substring starting position with STRPOS()

The first space in the violation column provides an ending boundary for the violation code. STRPOS() provides the position representing this boundary. We would like to extract all characters from the beginning of the string (located at position 1) up to the initial space character (located at position 4).

6. Extracting a substring using SUBSTRING()

The SUBSTRING() function requires a starting position and the number of characters to include in the substring.

7. Extracting a substring using SUBSTRING()

From the string, "Homerun", the substring starting at 1 with a length of 4 characters is "Home". To extract the violation_code from the example string shown here, we want all characters between the 1st position in the string and the position before the 1st space character. This is 1 position before the space character. By subtracting 1 from 4 (the value returned by STRPOS()), 3 characters are included in the substring. The function call for SUBSTRING() shown here extracts the string "09B" from the full violation string. Having extracted the violation code, the task of extracting the violation description remains.

8. Extracting a substring using SUBSTRING()

We need two values to extract the description. First, we need to know where the violation description begins. Second, we need to know the number of characters in the description. We have already learned how to locate the start of the violation description. It is located 1 position after the space separating the violation code from the violation description. Therefore, we add 1 to the return value of the STRPOS() function call. This expression evaluates to 5 which is the position where the violation description begins.

9. Calculating the length of a string with LENGTH()

The final argument to SUBSTRING() is the number of characters to extract. The violation description extends from the space after the violation code until the end of the string. To determine this number of characters, we need the length of the violation string. The violation descriptions vary in length so using a constant value is not possible here. The LENGTH() function gets us most of the way to the solution for this problem. The LENGTH() function accepts one string argument and returns the number of characters in the string.

10. Calculating the length of a string with LENGTH()

The length of the violation string includes the number of characters used in the violation code. Therefore, we must subtract the number of characters in the violation code and the space separating the violation code from the description. The value returned by the STRPOS() function call used previously provides this value. Notice that the difference between these two values equals the length of the description.

11. Calculating the length of a string with LENGTH()

Subtracting the STRPOS() expression from the LENGTH() expression provides the number of characters in the violation description.

12. Putting the pieces together

The full SUBSTRING() function call extracts the violation description as desired.

13. Splitting the violation column

The query shown here combines the components discussed previously. The violation_code and violation_description columns are defined using function calls to SUBSTRING() as described previously. The general solution replaces the string literal used in the previous SELECT statements with the violation column. This splits the violation column for every record.

14. Splitting the violation column

Recall the original scenario included a violation column combining the violation code and description.

15. Splitting the violation column

The result from applying this transformation is a violation_code column and a separate violation_description column with the corresponding violation description values.

16. Let's practice!

Splitting a string column into multiple columns can be a bit involved, so let's practice what we just learned to make sure that you are ready to add this technique to your toolbox.