Text classification

1. Text classification

Let's find out what our customers are actually saying about us.

2. Challenges of classifying text

Remember, reviews are free-text, so customers can write about anything! Are they mentioning the beds? The service? The noise? Reading and categorizing each review manually would take forever and simply doesn't scale. In this video, we'll use Cortex's `classify_text()` function to automatically label each review by topic.

3. Defining categories

Text classification is one of the most common and powerful applications of LLMs in businesses. To do this effectively, we need to identify what categories matter to our hotel business. For example, we could have categories for overall experience, location, staff, food and beverages, and facilities.

4. Classifying text

To classify our reviews, we first import `classify_text` from `snowflake.cortex`. We pass a review to the `str_input` argument and provide our labels to the `categories` argument. The model will predict which category best represents a given piece of text. When we print our results, we see that the model assigned the staff category for our review.

5. Converting outputs to the dictionary

When we check the type of our category, it's actually a string. Since dictionaries are easier to work with programmatically, let's convert it. We import the JSON package and transform our string into a dictionary using the `.loads` method. Much better!

6. Scaling the workflow

The previous workflow was great for an individual review, but how do we scale it? Say we want to build a classification pipeline that extracts and categorizes all reviews for a given month? We create a Python variable and assign our target month in numerical format, in this case, five, representing May. We can then query our reviews table, extracting the month from the `date` column and injecting our `month` variable as a filter.

7. Applying classify_text()

We store the resulting query as a pandas DataFrame. Our goal is to add a new column containing the predicted category. We define a custom function, `classification`, which will take one argument, `text`. In the function body, we use `classify_text` as before. We convert our result into a dictionary and return a label.

8. Applying classify_text()

Now, we can create a new column with categories by calling the `.apply()` method on the `"description"` column, passing our custom function. Let's review the predictions for the first row. It's classified as an overall experience. This simple workflow clearly breaks down free-text reviews and turns them into useful, analyzable categories.

9. Sentiment analysis

What if we don't want to categorize our reviews? What if we're only interested in the sentiment? We can use the `AI_SENTIMENT` function, which works out of the box in SQL cells! Let's test it on the same review, and it assigns positive sentiment.

10. Sentiment analysis

To extract just the sentiment value requires a bit of unpacking. Think of it like opening nested boxes. We call the categories key, slice the first item, and extract the sentiment. Now we have it. We tested AI sentiment on a specific string, but it also works for a batch of rows at once. The function recognizes five sentiments: positive, negative, neutral, mixed, and unknown.

11. Let's practice!

We've covered a lot. Now it's time for you to practice classifying both categories and the sentiment of our hotel reviews.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.