Build a simple AI app in Snowflake

1. Build a simple AI app in Snowflake

Are you ready to get building? You have everything you need to build a simple AI app. That is, you created your Snowflake free trial account, a database, schema, and your first Snowflake notebook as well. In this video, we are going to build a simple AI app with Snowflake. This app won't be anything complex or sophisticated, and that is intentional. The point of this exercise is for you to immediately get hands-on with Snowflake, as well as get high-level exposure to the Snowflake AI features. So you can see how easy it is to get started with AI on Snowflake. Okay, here is the scenario. You are a data scientist at a ski gear company that has customers across different countries. Customers typically call the company to complain about the damage to their ski gear. You are tasked with analyzing the customer support call transcripts to identify top customer complaints. You have access to raw unstructured call transcript data, and we are going to use natural language processing to process them and find the root cause of each complaint. To achieve that, here's what we will do. Let us load the data into Snowflake, analyze the call transcripts using Snowflake Cortex LLM functions, and deliver the insights through an AI app. What are Cortex LLM functions? We will dive deep into LLM functions later on. For now, let's understand at a high level what they are so you can build a mental model of it. Cortex LLM functions give you serverless access to perform common LLM tasks such as summarization, translation, and so on in Snowflake. Follow along with me to build this simple app. Now is a good time to stop the video and make sure you're logged into your Snowflake account. Let's begin by loading the data. We will retrieve the raw unstructured customer support call transcripts from AWS S3 and import them into a Snowflake table. This data set contains information such as date of the customer support call, the country and language of the customer, the name and category of the product, type of damage to the product, and the transcript of the conversation between the customer and the customer support representative. To build this app, we will need the Snowflake notebook that you should have downloaded when you completed the reading prior to this video. If you haven't downloaded the notebook yet, please pause this video and come back after you download it from the reading. In Snowsight, navigate to the Projects tab in the left panel, click on Notebooks. On the top right, click on the drop-down list next to the Create Notebook button and select Import from IPython Notebook option. Select the notebook you downloaded from the reading section prior to this video. Once the notebook is open, on the top right, click on Packages drop-down list. You can see that the Streamlit and Python packages are already installed in this notebook. Type snowflake-ml-python to install the snowflake-ml-python package. Once that is done, you can run the notebook cells one after the other to build your first AI app. Let us understand what each cell in the Snowflake notebook does. In the first cell, you can see that we are importing pandas and Streamlit packages to be used later in the code. You can also create a Snowpark session to process the call transcripts in the app. Also note, this is a Python cell. Snowflake notebooks support Python, SQL, and Markdown cells. It means that you can write SQL and Python and run them all together in the same notebook. The raw unstructured transcript data is stored in CSV format in an AWS S3 bucket. In cell two, we create a new Snowflake database, virtual warehouse, and schema to store the call transcript data. Notice that cell two is a SQL cell, and we run SQL commands to create these Snowflake objects. You could also do this in Python, of course, if that is your preference. Next, in cell three, we create a CSV file format to read CSV files from S3. An external stage in Snowflake pointing to the S3 bucket where the data is stored, and the call transcripts table in Snowflake to load raw transcript data into Snowflake. The call transcripts table has various fields to capture the customer complaints of Ski Gear Company. It has columns such as data of the customer support call, country and language of the customer, name and category of the product, type of damage to the product, and the transcript of conversation between customer and the customer support representative. Once we create the call transcripts table, the `COPY INTO` statement is used to bulk copy the data from the external stage into the Snowflake table. Before we continue further in the notebook, let us verify if the call transcripts data is loaded into the Snowflake table. Navigate to the data tab or the cylinder icon on the left panel and click on databases. You will see the list of all databases. Click on Ski Gear Support DB, and then Ski Gear Support Schema. Under tables, look for call transcripts and click on it. You can see the table definition on the right side. If you click on the columns tab, you will see the different columns and their data types as well. Now, if you click on data preview, you will see a sample of a few rows of data. Now, let's hop back to the notebook and understand what the rest of the notebook cells are running. Let's go to cell four, which is a sample transcript copied from the dataset for your reference. In cell five, we use the Snowflake Cortex LLM function named complete to process the transcript and summarize it. First, we import the complete function and create a prompt for instructing the large language model to summarize the call transcript into JSON format. You can modify this prompt as well. This prompt instructs the model to generate an output in JSON format with three key fields, product name, defect, and summary. Generating the summary in JSON format is helpful because this allows us to use it programmatically. In cell six, we invoke the LLAMA 3.2 one billion parameter model and pass on the prompt and call transcript to the complete function. Given a prompt, the complete function generates a response using your choice of supported language model. Since this model is not great at responding to the prompt in a highly accurate manner, can we try other foundation models and see if we get to a clean JSON output? In cell seven, I will show you how you can change the language model to Mistral 7b instead of LLAMA 3.2 by simply changing the argument of the complete function. It is that simple to experiment with different LLMs using Snowflake Cortex. The Mistral model outputs only the JSON summary we prompted for. This is great. And at last, in cell eight, the code snippet wraps the complete function call into a summarized function in Python. This is to build a simple UI using Streamlit for the summarization app. When you run this cell, it outputs a big text box and a clickable summarize button below it. Now, copy the sample call transcript from cell four and paste it into the Streamlit app text box and click on the summarize button. You will see how the model summarizes the transcript into JSON format with three keys, product name, defect, and summary. That is it. With this, you built your first AI app that summarizes the SkiGear company's call transcripts to identify the product name, defect if any, and the summary of each call in JSON format. Two quick pointers before we wrap up. Firstly, I have a bonus cell in the notebook. In cell nine, you can see how to invoke the cortex complete function using Snowflake SQL if you prefer to do it the SQL way. Secondly, there is an alternative way for you to run the entire Snowflake notebook instead of running each cell individually. You can click the play button that says run all at the top right. This will run all of the operations from all of your notebook cells in the file starting at the top. In our case, we ran each cell individually so I can walk through the code snippet in each cell and understand what they do. Also, cells eight and nine are interactive cells. You need to input the transcript into the text box for the model to summarize it. So you should run interactive notebook cells individually as well. That's it. With this, we are done with module one. Let's quickly recap what we did. First, you loaded the call transcript data set from AWS S3 bucket into a Snowflake table. And then you used the Snowflake cortex complete function to prompt the foundational models with instructions to summarize the transcript in JSON format. Finally, you built a Streamlit UI for this AI application. This was a great start that shows how easy it can be to get value from your data using generative AI in Snowflake. There is so much more that we can learn and do. In the next module, we will dive deeper into Cortex LLM functions, using them to extract info from unstructured data. You will stitch the functions together into a meaningful architecture to get to a structured data output as well.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.