ML - Snowflake ML Overview - Part II

1. ML - Snowflake ML Overview - Part II

We’ve talked about how you can do ML in seconds with the Snowflake Cortex ML functions. Now let’s talk about how to do “ML in Minutes” with Snowflake. There are a bunch of things to talk about, so we’ll just cover each briefly. First, I want to point out some terminology – Snowpark ML Modeling, the Snowflake Feature Store, and the Snowflake Model Registry all have APIs that are accessible from the unified “Snowpark ML” library. You can access APIs for each of these three features by installing the Snowpark ML library in Python, and you can do this from your preferred notebook or IDE, including Snowflake Notebooks, which we’ll talk about in a second. A big part of Snowpark ML are the Snowpark ML Modeling APIs, which are based on standard Python frameworks like scikit-learn and XGBoost. You can see this code example on the side. You can use Snowpark ML Modeling APIs for preprocessing data, feature engineering, and training models inside Snowflake, which is great because that means you don’t need to be moving data in and out of Snowflake to do ML. Another cool thing is that for common scikit-learn feature engineering and preprocessing functions or hyperparameter tuning in model training, Snowflake executes them in a distributed way. Here’s an example – You can see that we’re splitting our data into training and test dataframes. Then we’re building a training pipeline – and here’s where it’s using SimpleImputer, Pipeline, and XGBClassifier that we imported from snowpark.ml.modeling at the top. You don’t have to know what any of those are for this course, except for XGBClassifier, which we’ll cover in detail in a future video. Then it’s actually training the model. And finally it’s evaluating the model. Before Snowpark ML, doing ML in Snowflake meant you were pretty much required to manually create stored procedures or UDFs to make that happen, but now for scikit-learn, XGBoost, and LightGBM-style models, you’ve got Snowflake-native support. We’ll become very familiar with Snowpark ML Modeling in a moment, because the next video will be all about making and using an XGBoost Snowpark ML model. Let’s talk about the Snowflake Model Registry. The Snowflake Model Registry helps with model management, which is the ability to easily track versioned model artifacts and metadata. If you’re training a bunch of models, it becomes really annoying to keep track of all the different versions, so whether you use Snowflake Model Registry or something else, a model management tool is really helpful. One thing I really like about the Snowflake Model Registry is that it lets you manage and execute models in Snowflake, regardless of origin. So it can handle models built with the Snowpark Modeling APIs, but also models built externally with tools from cloud providers like Sagemaker, Azure ML, and Vertex AI, and even LLMs from Hugging Face. In this example on the right, you can see that we’re creating a scikit-learn model, training it, and then using registry.log_model to store it in the Model Registry. Pretty cool. Now let’s talk about the Snowflake Feature Store. This isn’t a technical definition, but you can think of a feature as a column of data you might want to use as a variable in a model you’re creating. When you’re making lots of machine learning models, it’s easy to end up pulling one feature from dataset A, and another from dataset B, and forgetting which you used and how they differ. The Snowflake Feature Store addresses this by letting you create, store, manage, and serve ML features for model training and inference. It’s very handy because it helps you maintain a single source of truth for all these features, and it automates feature updates continuously so that you have consistent downstream pipelines. Now let’s talk about Snowflake Notebooks. A Snowflake Notebook is a SQL, Python, and Markdown cell-based development interface built into Snowsight, Snowflake’s UI. If you’ve used other notebooks before (Jupyter notebooks, etc.), this should feel familiar. And while this isn’t strictly an ML tool, we’ve included it here because it’s really, really helpful when doing ML work, and it works really cleanly with Snowpark ML. Most data scientists, myself included, kind of see notebooks as a must when doing ML development. Snowflake Notebooks make it easy to quickly explore data with your preferred language, and visualize results using popular Python libraries. And securely sharing your work is easy with Notebooks because they’re governed by role-based access controls. Okay, let’s talk about Streamlit in the ML context. We discussed Streamlit in our GenAI Overview, but I wanted to emphasize that it’s also really useful in ML workflows. With Streamlit in Snowflake, Python developers can turn data and ML models into interactive web apps without needing to do any front-end development. With Streamlit in Snowflake, you can build apps using its component-rich, open-source Python library. You can modify code and see changes go live with side-by-side editor and app preview screens in Snowflake. You can share Streamlit apps via URLs that leverage existing role-based access controls and run on Snowflake’s scalable, secure and performant infrastructure. It’s really cool. As I mentioned, we’re not going to talk about the top right of this chart here, so the last thing I want to call out is Snowpark Container Services. In this “seconds / minutes / hours” framing, Snowpark Container Services fall on the “hours” side. Snowpark Container Services lets you build custom ML models by letting you deploy, manage and scale containerized workloads using Snowflake-managed infrastructure. These containers can include code in any programming language (e.g. C/C++, Node.js, Python, R, React etc.) and can be executed using configurable hardware options, including CPUs and GPUs. So if you need to do some custom AI/ML work, Snowpark Containers give you flexibility, and save you the headache of having to manage compute and clusters for containers. They also make it so you don’t have to leave Snowflake’s governed data ecosystem to build sophisticated AI/ML models and apps, which is pretty great. So that’s it for our flight over the Snowflake ML landscape! I hope you observed the following during our journey: One, to do ML work very quickly, you can use Snowflake Cortex ML functions (forecasting, anomaly detection, etc.). Two, if your use case requires a little more care: You can use the Snowpark ML Modeling APIs, which make it easy to use common Python ML frameworks from within Snowflake You can use the Snowflake Model Registry to manage your models and metadata You can use the Snowflake Feature Store so you have easy access to a continuously updated set of features for both model training and inference. And remember that you use one Python library, Snowpark ML, to access the Snowpark ML Modeling APIs, the Snowflake Model Registry, and the Snowflake Feature Store. You can do your ML development in the Snowflake Notebook And you can use Streamlit in Snowflake to turn your data and ML models into interactive web apps And three, if your use case requires even customization, you can build custom models using Snowpark Container Services. Now let’s spend a moment actually using Snowpark ML to train a model plus make a prediction!

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.