Loading unstructured data to a Snowflake stage

1. Loading unstructured data to a Snowflake stage

I hope you're feeling pretty confident about how a RAG can optimize your workflows in different use cases. So I know what information I want accessible to the search. So now the first step in that process is uploading the data into a stage so my RAG can access it. Let's go do that now. In this video, we'll go over external and internal stages and learn how to load data into them. You'll be introduced to the FOMC, Federal Open Market Committee, data set, that you'll use in the hands-on practice we'll do together in this video. We should talk about Snowflake stages. Snowflake is pretty flexible. It allows you to load your data from both internal and external stages, where internal stages are stored in the Snowflake environment and external stages are stored in the big three cloud storage providers, or CSP, which are Amazon S3, Google Cloud Storage Buckets, and Azure Containers. This is where we'll load our unstructured data. Now let's get your hands a little dirty by loading your own data. For this next section, ensure that you are logged into your Snowflake environment so that you can follow along. Pause the video here if you need to. To start, I've logged into my Snowflake trial account and selected the plus create button at the top left. From there, I've scrolled down to the notebook and selected import iPy Notebook file to open up my file. In this window, I'll first give my notebook a name. Then I'll choose a compute warehouse. And last, I'll choose the Python environment to be a warehouse notebook. For a more flexible environment with more packages, I could choose to run on a container, but I'll run on warehouse here. You see that at the top, the owner is my username. This notebook will be stored in our personal database. This is a special database only for notebooks associated with my user. Now let's click create. In this example, I'll be showing you how to build a rag with Cortex Search. Before we get started, I'll need to install some libraries. These are listed at the top of the notebook. I'll add them by clicking the packages dropdown and selecting each one. Now we're ready to roll. The first thing we do in the notebook is create the database, tables, and warehouse. At this step, I'm going to choose an extra small warehouse size and set up the suspend and resume settings. Please note, when we execute the create database statement, we create a database that automatically includes a schema named public. When we create the warehouse, it's initially suspended. Next, we'll get the PDF data. The data that we'll be using for this is a sample data set from the Federal Open Market Committee, or FOMC. This data set is a sample of 12 ten-page documents of beating notes that we'll use for this example. Remember, the use of this sample data is only for demonstration. In your own use cases, you'd bring your own data for this and upload it to a Snowflake stage. In the next step, we'll create our Snowflake stage. We do this by running the code create or replace stage, enabling the directory, and setting the encryption to Snowflake SSE, or server-side encryption only. Then we can load data into it. For this example, we're going to use the Snowstacked interface to upload our data. To do this, we'll select data on the left-side navigation menu. We'll select Database Cortex Search Tutorial Database. Select our schema as public, and select Stages, and select FOMC. Now we'll click Browse and go to our Downloads folder and open the folder with the PDF files in it. Select all the files and drag the files into the UI. You'll see them in the pop-up window immediately. Another way to do this is to select Browse to choose the files from the dialog window. Now select Upload to upload your files. Now let's go back to our notebook by choosing the Projects icon, selecting Notebooks, and uploading our notebook. And let's restart our session if we need to. Okay, now just to be sure before we continue, let's check that the PDF files are uploaded to the stage. We have a bunch of files now loaded into stage, but they're not ready to use yet. That will be next. We got a lot done in this video. You learned how to load data into a stage. And now that you have the data staged, we can move on to the next step and parse the data. Nice work. I'll see you in the next video.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.