Get startedGet started for free

Searching and extracting text

1. Searching and extracting text

Now that we've cleaned our dataset, let's add new features to make our app more useful.

2. Searching text

Let's start with keyword search. Imagine a user remembers having a great meal somewhere, but can only recall the word "Burger" from the name. How do we find all matching businesses?

3. Searching text

We start by adding a new column with an expression on the business column.

4. Searching text

We use the string .contains() expression, which is True if the business name contains Burger and False if not. We add this as a new column called is_burger

5. Searching text

which is True for Bang Bang Burger in the second row, but False for 7burgers in the first row. This is because the contains expression is case sensitive - meaning it only matches Burger with a capital B.

6. Searching text

We can make the match case-insensitive by chaining expressions. First, we make the business name lowercase

7. Searching text

and then matching on lowercase burger. Now is_burger is True for both the first and second rows.

8. Filtering for text matches

If we want to filter the DataFrame to only keep rows that match our search text, we use the filter method.

9. Filtering for text matches

and use our expression to identify rows where the business column contains burger. We now see our full set of burger joints.

10. Extracting text

With search working, let's tackle food categorization. We want to extract keywords like "burger" or "coffee" from business names to help users filter by cuisine. For simplicity, we keep the business column in lowercase here. Our goal is to extract burger from the first two rows and coffee from the third and fourth rows.

11. Extracting text

We start by using with_columns to add a new column

12. Extracting text

and we create an expression on the business column

13. Extracting text

We use the string .extract() expression and pass the word burger as the argument. Be aware of the syntax here - we must pass burger inside parentheses to get a match!

14. Extracting text

We call this new column food which is burger for the first two rows and null otherwise.

15. Extracting text - multiple terms

But we can also identify coffee houses from the names. To extract multiple terms, we separate the target terms by the pipe operator inside .extract(). The pipe means we want to extract either burger or coffee. By adding more terms in this way, we could quickly identify what many of the businesses serve.

16. Replacing text

Finally, let's clean up those location names. Notice abbreviations like "Rd." and "St." - we want to expand these for better readability in the app.

17. Replacing text

We can update the location column

18. Replacing text

using the string .replace expression to replace Rd with Road

19. Replacing multiple strings

We can do a single replacement with string .replace, but to replace multiple abbreviations in one go, we use the replace_many expression.

20. Replacing multiple strings

passing the list of original strings first

21. Replacing multiple strings

followed by their replacements

22. Replacing multiple strings

Running this gives us the cleaned-up output.

23. Let's practice!

We've added keyword search, food categorization, and cleaner locations to our app. Now it's your turn to practice these text operations!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.