1. Working with string columns
In this video, we are going to learn methods to reshape string columns.
2. Columns with strings
By string columns we are referring to a column that contains a string. In our example, the column title contains a string: the title of a book.
3. String methods
Luckily, pandas Series and Indexes have a set of string processing methods. These methods allow us to work with each string easily.
They are accessed by using the str attribute.
4. Splitting into two columns
Let's say we want to split the title column.
5. Splitting into two columns
We can use the split method of the str attribute, passing in the element to split on, in our case, the colon. The method returns a list for each row. Each list contains the two sub-strings obtained from splitting the title by the colon.
We could also access only one of the resulting elements.
6. Splitting into two columns
In that case, we use the get method from the str attribute, passing in the index of the element we want. In our example, we get the element of index zero. The get method returns the first split element of each row.
7. Splitting into two columns
We can also set the expand argument of split to True. This will return a new DataFrame with two columns, one for each split element.
8. Splitting into two columns
This allows to assign the split elements to columns in the original DataFrame. In our example, we first split the column title by the colon, indicating we want to expand it to two columns, and assign it to two new columns,
"main_title" and "subtitle".
This is useful because now, we can drop the original column title. And after that, transform the DataFrame by using the new columns as index, getting a clean long DataFrame with a multi-level index.
9. Concatenate two columns
Let's work with the following dataset. We can see that the name and last name of the author are in different columns.
10. Concatenate two columns
We could concatenate those two columns into one. For that, we will use the cat method of the str attribute. In our example, we apply cat on the name_author column. We pass in the other column we want to concatenate and the separator element, for us, a space.
We get a Series of the new concatenated strings.
11. Concatenate two columns
We could also assign the generated series to a new column in the original DataFrame. In our example, we create a column named author.
12. Concatenate two columns
This is helpful because then, we can melt our DataFrame using this new column as index instead of using the two original columns.
13. Concatenate index
The cat and split methods can also be used for indexes. The following DataFrame has an index named main_title.
14. Concatenate index
To concatenate the index with a column in the DataFrame, we access the cat method from the str attribute from the index. We assign it to the index, getting the new concatenated string.
15. Split index
We can do the same to split the string contained in the index. We access the split method of the str attribute from the index again. We set expand to True, and we now get a DataFrame with a multi-level index.
16. Concatenate Series
We have only worked with concatenating columns, but we can apply the cat method to concatenate a column with a pre-defined list. In the example, we defined a new list. It contains three elements, one for each row of our DataFrame.
We now apply the cat method, passing in the list and the separator element. As we can see in the output, we obtain a Series where each string in the main title has been concatenated with the corresponding element in the list.
17. Let's practice!
Now, it's your turn to split and concatenate string columns.