Get startedGet started for free

Understanding the Semantic Model

1. Understanding the Semantic Model

Welcome back. Now that everything is set up and ready to go in your notebook, we can get started. But before we move on, I wanna talk about semantic models in a little more detail. So let's discuss semantic model structure. A semantic model represents a collection of tables, each of which contains descriptions of specific aspects of the table. Each table described in the semantic model maps to a base table in Snowflake. The semantic model addresses issues related to language differences between business users and database definitions. And this provides semantic details like descriptive names and synonyms that allow Cortex analysts to answer questions about data the way you intend. Let's take a look at an example. We'll include information that sales and income are synonyms for revenue. Although sales and income are not the same in general accounting and finance terms, imagine in this particular case that they happen to be used synonymously by individuals in the business unit. This makes it important when we clarify this in our semantic model, so users can ask questions using the synonyms sales and income rather than being forced to use only accepted terminology. The semantic model should be organized by business domain or topic with limited scope. For example, one semantic model for sales analytics and another for marketing analytics. This leads to a well-scoped semantic model and well-scoped semantic models ensure high precision and accuracy of results. Now I want to talk about the tables that we'll create for a semantic model. The semantic model and the semantic model tables sit on top of the tables and views in the database. That was a mouthful. To build the semantic model, we build semantic tables that contain some of the information that we see in the views, but there's another layer to these tables. When we build our semantic tables, we define several elements. Each table in a semantic model has a name that is a unique descriptive name for that table. For example, sales data. These table names need to be unique. So some planning before implementation can save time later. Under this, we enter a description of the table and this is the table that the semantic model will be referencing. Next, we need to enter a fully qualified name of the table. This just means including the database schema and table name. We can also enter synonyms or other terms and phrases that are used to refer to the table. These need to be unique. Let's talk a bit about the base tables each semantic model has. In these base tables, you can describe the columns available for querying. Each is described with a name, synonyms, a description, a SQL expression, and the data type of the column. This is the first part of the YAML file. In this part of the YAML file, we define the dimension columns in our table. These include a list of categorical variables in the table. Now that we have the dimension set, we can set time dimensions, measures, and filters. Time dimensions is a list describing time values, including columns that contain data, such as sales date or a timestamp. Measures are the fields with numerical values in the table, such as revenue, impressions, or salaries. Note that we can also set a default aggregation function as a measure. If we had them, we could set filters here. These named filters represent a SQL expression and are common ways that the table is likely to be filtered. In addition to this, Cortex Analyst also supports SQL joins. To do this, you'll define the joins and the relationship definitions part of the file that comes after the list of the tables. I will show you how to do this later in the module. Next, you'll set up the first part of your notebook environment, which we will add the semantic model to later on. With that discussion of semantic tables out of the way, let's go back into our notebooks and load up our data into the tables we created earlier. We have three tables that we need to get into our stage, daily revenue, region, and product, all in CSV format. And we also need to get our semantic model, revenue time series, YAML, into our stage for upload. To do this, you'll be moving through the Snowsight interface and away from your notebook to set this up. First, head over to the left pane of the UI. There's a database icon next to the word data, and select that, and then select add data. From here, you get a wide variety of options to add data to your project. We will use the load files into a stage button, which you see at the top here. Drag the four files from your downloads folder or whichever folder that you saved them in into this pop-up. Now move down and select your database, schema, and stage, which are a Cortex Analyst demo as the database, revenue time series as the schema, and raw data as the stage. Once that's all in, select upload. Let's check that everything is uploaded. In the Snowsight UI, navigate to data and then databases. Here you see all the databases on your account that are associated with the role that you're using. Choose Cortex Analyst demo. From here, move to the schemas tab and select revenue time series. Next, move to the stages tab and look at the raw data stage. If prompted, select the enable directory table and Cortex Analyst warehouse to refresh the directory table. You should now see the four files listed in the stage, ready to go. Now we're ready to create Snowflake tables from these files in stage. Let's go back to our notebook. To start, we copy the data from our stage into the Snowflake tables that were just created prior to this. We set the use statements to define Cortex user role, the database, schema, and warehouse to use. Then execute copy into for the daily revenue, product, and region tables. Nice. Everything is in and we're ready to move on. A good analyst will dig through all of an organization's data to understand the semantic relationships and definitions that come from the underlying tables. They know which data to pull in for which question, and when the manager asks questions about 2025, they know the manager really meant fiscal year 2025. Now let's take a look at how this works. Before we mentioned that we wanted to use LLMs to write schemes, we wanted to use LLMs to write SQL on command. Cool. One approach to do this would be just to call the LLM directly. I could construct a prompt with a question along with some information about the tables and columns I have available. We'll also include in our system prompt that the LLM is a helpful assistant that writes SQL from natural language user questions, and that it should only respond with SQL. For this task of writing SQL code, let's use a highly capable foundation model, Cloud Sonic 3.5. It's hard to evaluate SQL by just looking at the code, so let's execute it to see if the result answers our question. That did a pretty good job. However, it's a bit clunky to pass all of that information about the database into the prompt each time I want to write SQL, and it seems a bit arbitrary on what information I should really include. While we need some way to pass information about our database into the context window, it'd be great if we had a structured way to do so. When we use a semantic model, we can do that in a standard comprehensive way that scales to hundreds of tables and complex joins across our database. When we create that semantic model, we are creating a structured method to capture information about the data and relationships that the LLM can reference when it's generating SQL. This makes the generated SQL much more accurate, and we'll look at this in more detail in the following videos. Nice work. Now that you know about semantic models, we can move on to how to call them for your Text to SQL workflow. Creating the semantic models is the heavy lift that allows your query engine to intelligently understand the underlying database, tables, and views. This step can take a lot of time depending on how complex or large your data is, but it's worth the effort. Once the Text to SQL query engine has this, it's able to generate valid code that can be used by the different workflow elements. We'll continue with this in the next video. I'll see you then.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.