Get startedGet started for free

Overview of Lakehouse AI

1. Overview of Lakehouse AI

Alright, let's start! In this video, we will discuss an overview of Lakehouse AI, the Databricks approach to AI, and Machine Learning use cases in the lakehouse.

2. Lakehouse AI

So what exactly is Lakehouse AI, and why would we use the lakehouse architecture for AI or ML use cases? With the Databricks platform, data scientists can perform advanced analytics in a way they couldn't before. Firstly, they need access to reliable data, and that is accomplished through the Delta lake. They will be able to access clean data that they know is accurate and will even have access to the raw files, which is often required for different applications. Next, they will have access to a highly scalable and flexible compute engine with Databricks clusters. Thirdly, these clusters and the Databricks platform are built on open standards, so they can use whatever languages, libraries, or frameworks they want, unlocking many of their use cases. Finally, they no longer have to operate in their separate silos and can collaborate with the other data persona they need to work with.

3. MLOps Lifecycle

We can generally think of data scientists' work as falling into the Machine Learning Operations process, or MLOps for short. Every step they go through, from data ingestion all the way to model deployment, falls into the MLOps lifecycle, and data scientists need a way to manage that work. This course will not go super in-depth about MLOps as a practice but will cover the Databricks point-of-view on how the platform can support MLOps processes. According to Databricks, there are three aspects of MLOps. This video will introduce the main concepts, and additional videos will dive deeper into each section.

4. MLOps in the Lakehouse

The beginning of the MLOps process falls into DataOps, which involves getting data ready for any kind of machine learning application. This starts by integrating data from different sources into the Delta lake, which can use the AutoLoader capability. Then, we want to transform those data tables into a more usable, clean format, which often can be completed with Delta Live Tables. Finally, data scientists need to create tables of different features to feed into the model they choose, which uses the Feature Store capability in the platform.

5. MLOps in the Lakehouse

After data has been processed correctly, a data scientist can start developing a machine learning model. They will begin by developing and training a model, usually done in Databricks Notebooks or in a local IDE. Some users are looking for a pre-defined model template that can utilize Databricks AutoML to generate a baseline model. Throughout the process, data scientists must track metrics and parameters to understand their model performance using the MLFlow framework. Once the model is developed, the model must be centralized so that users can consume and use the model, which uses the Databricks Model Registry.

6. MLOps in the Lakehouse

At the end of the MLOps process, we have DevOps, which is all about getting the developed models into a production capacity. Throughout the development process, and especially once a model is ready for deployment, access must be governed securely, which can utilize Unity Catalog in the Databricks environment. As models are tuned over time, data scientists need to control which version of a model is in production and which is in a testing environment, which can be controlled in the Model Registry. Finally, the model is ready for deployment, which can be done directly from Databricks Serving Endpoints.

7. Let's review!

As we have seen, Databricks enables an end-to-end MLOps experience. Now, let's review some of the concepts that we just discussed.