Expanding the Scope of the Semantic Model

1. Expanding the Scope of the Semantic Model

Welcome back. Now that Cortex Analyst can answer questions in natural language about revenue and break it down by region, let's expand it to include breakdowns by different product lines. Jump back into your notebook to follow along. We now have a model that allows Cortex Analyst to answer our revenue questions by region. But what would be even better is if we could include breakdowns for different product lines. This can open up the model to another business unit that can take advantage of what we've built here. In the next cell, we'll define an extension to our semantic model, semantic model product. We will name the table and give it a description that references it as a dimension table. Under dimensions, we'll set product ID and product line with some sample values. Then we'll just append it onto our semantic model using replace so that it sticks in the right spot. Next, we need to make sure that semantic model includes the relationship between our revenue and the new product table. We'll join on product ID in a left outer many-to-one join. Product lines can be a bit ambiguous. This is a good opportunity to use search so that we can tie the product mentioned in the query to the column values in our table. For example, if someone asks how much revenue could be attributed to apparel, we can use Cortex Search behind the scenes to resolve apparel to clothing, and then use the term clothing as the product line in our generated SQL. We can do this by integrating Cortex Search into our semantic model which will help to make sure we reference the correct product categories in the SQL generated by Cortex Analyst. I've dropped a link to a reading on this after the video. We created the search service, product line search service on product dimension, and pass to the search service the warehouse, target lag, and a nested select distinct product line from the dimension table product. The search service now has a list of all distinct products defined. In this next cell, we build and update the semantic model with our product search service by adding the product dimension table and we set this table and its dimensions for this update. Then we'll update our master semantic model again. Now, let's take a look at what we have. Looks good. Our semantic file has been updated. Let's test this out by running our question again. That was better, but what if we could add custom instructions to our semantic model so that it always breaks down the answer by year? We can do this by adding the custom instructions always break down by year to our semantic model. This should now alter the semantic model so that any query response will be broken down using this instruction. Now, let's run the query again and see what we get back. Success. The year is now included in the response just like our custom instructions specify. One of the great things about semantic models is that as you develop them you can take standardized organizational queries that have been vetted already and add them to the semantic model. This is useful because it creates a repository of verified queries that can be offered to users as options when they're searching, and by being verified, the user knows that they can trust the output when using these queries. In this cell, we are taking the query from before and appending it to the semantic model. First, name the query and give it the question that it answers. Then we'll add a verified at time and the name of the person that verified it. Then we'll add this to the semantic model. Now that we have this new verified query appended, let's test and see what we have using the GetSQL function. After running GetSQL from Cortex Analyst and the question from before we see the annotated SQL we appended, followed by the identifying information, including the query name, question, the time it was verified at, and the verifier, me. This is great because we can improve the reliability of high traffic known queries, and we can expand this list out over time as our app is used. If we wanted, we could even surface this back to the user so they know the SQL generated has been verified already by a human. This is looking good, but our semantic model is getting bigger and bigger as we add more information to it. Using the model as a string was great for understanding how the semantic model affects the questions we can ask of our data, but it would be more scalable to point to a file path instead of having the string on hand. We'll dive into that next. Passing models as strings is great while we are building the semantic model because we can look at what our model looks like at each step of the development process. Pointing to the semantic model file path in stage only requires a single line code change. In our request body, we'll just change the semantic model argument to semantic model file, and we'll pass in the fully qualified file path to the semantic model file that's already sitting in our stage. Then we can quickly verify and rerun the SQL generation with Cortex Analyst pointing to the file on stage instead of the string semantic model from our notebook. Wonderful. Everything is where we expect it to be. Now that we've looked at how semantic models are built, we'll move on to look at the semantic model generator that Snowflake offers in the Snowflake UI. This model generator can be used to either build your model from scratch or edit and update your model once it's already built. It's very useful and I'm excited to share it with you. Get back into your Snowflake trial account. On the left side of your screen, look for the AI and ML icon under the projects tab that used earlier to get to your notebook. From there, go to the studio and move over to the Cortex Analyst tile and select try. Since you have already run the notebook, you have the database, schema, and warehouse already ready to go. Here, I can choose either to edit an already built semantic model or start a new one. For this example, we'll build a new semantic model from zero to hero. Select plus create new. Start by selecting the database, schema, and stage from earlier. Name the semantic model. I will name mine revenue time series generated because the earlier one was named revenue time series. Next, I'll pop in a description. I'll copy and use the same one that is given to us as a hint here and hit next. This part is optional, but it's a good idea to enter your own questions when developing your own non-demo cases. For this example, I'm going to copy the example questions again. I'm also going to add one more question. That is, what product lines have the highest cost of goods sold? Then I'll select next and we'll move on to selecting our data. Now, we'll select our data. Here, we see the database location where we chose to store the semantic model. This includes all of the tables we need. Last, we'll select all of the columns from these tables, and I can do so by just selecting the checkbox next to each table. Great. Now, we've selected our data, we can use the UI to get it set up. On the left, we have the setup for a semantic model, and I have the different logical tables, the revenue table, product dimension, and the region dimension, and I have places where I can add relationships and verified queries. We don't have these yet. First, I will make sure that my logical tables are looking good. Let's start with the region dimension. This looks pretty good. We have sales dimension and state, but it looks like there's an error here. The generator misidentified region ID as a measure in this dimension table. This is our unique identifier, so I'm going to move it over to dimensions. I also have the same issue for product dimension, so I'll move that from measure to dimension as well. Now, let's make sure the daily revenue looks good. I have five measures, revenue, cost of goods sold, forecasted revenue, product ID, and region ID. Both product ID and region ID need to be moved to dimensions. Looks pretty good. Now, let's add some relationships, so the semantic model can understand how these tables are joined together. Let's add one first for the product. We will do a left outer many-to-one join. From the left outer, many-to-one join. From the right outer, many-to-one join. From the right outer, many-to-one join. From the left outer, many-to-one join. From daily revenue to product dimension, joining on product ID. Great. Let's add another one called region. This will also be a left outer many-to-one join. The left table will be daily revenue again, but this time we'll join to region on region ID and select add. Now, we have the two relationships we just made showing the type of joins and the columns that they were joined on. We can now map the relationship between product and region. With all of that set up, we can play with it on the right side of the pane. The first thing I'm going to do is select the play button next to explain the data set. In addition to answering questions, Cortex Analysts can also describe the data that we have in front of us. It's pretty simple. It tells us the model contains information about revenue, cost of products, and regions. This tells a user that they can do financial analysis or ask questions like, what is the total revenue and cost of goods sold for electronic products in North America? The questions here are the onboarding questions I mentioned earlier, which can be specified in the semantic model. These are shown to the user when the app launches, so they know what sort of questions that they can ask their data. Next, I want to ask, what is the lowest revenue day in California? Now, Cortex Analysts is interpreting the question and generating the SQL. Cool. Here's the output and the generated SQL. The lowest revenue day in California was November 12, 2023. I want to add this as a verified query. Let's call it low revenue California. I can add this verified query to the onboarding questions if I want. Now you see it over here with the name and who it was verified by. If I want to directly edit the YAML, I can select the Edit YAML button and see it right here. From here, I can make edits as needed, just like we were doing in the notebook earlier. Once I'm done with my edits, I have one more thing to do, save the semantic model. Select Semantic Model Viewer to get back when you are done. This is a big one. Now that you have the backend ready for connection, we can start to think about connecting it to a frontend. This was a really important step, and having a well-developed semantic model file is what allows Cortex Analysts to do its magic and return accurate, valid SQL, which the LLM is going to use to generate its response. We can't really send this notebook out to people yet. Notebooks are great for experimentation, but we can't send notebooks to prod. To really make an impact, we want as many people as possible to use what we built. We need a frontend UI for users to interact with, and we're going to use Streamlit for that, to quickly turn the same code we already wrote into a simple Python-based UI. Let's do it next.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.