Searching and extracting text
1. Searching and extracting text
Now that we've cleaned our dataset, let's add new features to make our app more useful.2. Searching text
Let's start with keyword search. Imagine a user remembers having a great meal somewhere, but can only recall the word "Burger" from the name. How do we find all matching businesses?3. Searching text
We start by adding a new column with an expression on the business column.4. Searching text
We use the string .contains() expression, which is True if the business name contains Burger and False if not. We add this as a new column called is_burger5. Searching text
which is True for Bang Bang Burger in the second row, but False for 7burgers in the first row. This is because the contains expression is case sensitive - meaning it only matches Burger with a capital B.6. Searching text
We can make the match case-insensitive by chaining expressions. First, we make the business name lowercase7. Searching text
and then matching on lowercase burger. Now is_burger is True for both the first and second rows.8. Filtering for text matches
If we want to filter the DataFrame to only keep rows that match our search text, we use the filter method.9. Filtering for text matches
and use our expression to identify rows where the business column contains burger. We now see our full set of burger joints.10. Extracting text
With search working, let's tackle food categorization. We want to extract keywords like "burger" or "coffee" from business names to help users filter by cuisine. For simplicity, we keep the business column in lowercase here. Our goal is to extract burger from the first two rows and coffee from the third and fourth rows.11. Extracting text
We start by using with_columns to add a new column12. Extracting text
and we create an expression on the business column13. Extracting text
We use the string .extract() expression and pass the word burger as the argument. Be aware of the syntax here - we must pass burger inside parentheses to get a match!14. Extracting text
We call this new column food which is burger for the first two rows and null otherwise.15. Extracting text - multiple terms
But we can also identify coffee houses from the names. To extract multiple terms, we separate the target terms by the pipe operator inside .extract(). The pipe means we want to extract either burger or coffee. By adding more terms in this way, we could quickly identify what many of the businesses serve.16. Replacing text
Finally, let's clean up those location names. Notice abbreviations like "Rd." and "St." - we want to expand these for better readability in the app.17. Replacing text
We can update the location column18. Replacing text
using the string .replace expression to replace Rd with Road19. Replacing multiple strings
We can do a single replacement with string .replace, but to replace multiple abbreviations in one go, we use the replace_many expression.20. Replacing multiple strings
passing the list of original strings first21. Replacing multiple strings
followed by their replacements22. Replacing multiple strings
Running this gives us the cleaned-up output.23. Let's practice!
We've added keyword search, food categorization, and cleaner locations to our app. Now it's your turn to practice these text operations!Create Your Free Account
or
By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.