What is Cortex Fine-tuning?

1. What is Cortex Fine-tuning?

Welcome back. Let's talk about what we're going to do in this video. By the end of this video, you will be able to fine-tune a large language model to accomplish a new task. You can fine-tune an LLM to respond in new styles which are not possible with an out-of-the-box LLM. You will also be able to distill the behavior of the large model to a small model so that you get the same performance at a lower cost. Excited? I am. Now, you might be thinking, I ask all sorts of questions to ChatGPT and other models every day and I get the right answers back. If these models work just fine, why do we need to fine-tune them? Well, general-purpose language models such as ChatGPT are great, but they are just that, general-purpose. Imagine a high school student who has a broad education covering a wide range of subjects such as mathematics, science, history, and languages. This student has a general understanding of many areas but is not an expert in any one specific field. Now, picture this student deciding to pursue a specialized career, such as becoming an electrician or a chef. General-purpose LLMs are similar to this high school student with broad knowledge. They have been trained on vast amounts of data and can provide information and answers across various subjects, but they are not specialists in any particular area. Fine-tuning these LLMs is similar to sending the high school student to a trade school. At the trade school, the student focuses intensely on a specific skill set, such as electrical work or culinary arts. Through this specialized training, the student becomes highly proficient in that particular field, much like how fine-tuning an LLM allows it to become highly skilled and accurate in a specific domain or task. Now, just as the trade school student can perform complex tasks in their chosen field with greater precision and expertise, the fine-tuned LLM can handle complex tasks within its specialized area, with much higher accuracy and effectiveness. That makes sense, right? Say, occasionally, these models don't give me straight answers to my questions, so I have to word the question differently to get to the right answer. Can we simply experiment with different prompts to get the model to respond the right way? Why would we fine-tune? Well, you can certainly play with different prompts. This approach can be effective in some cases. However, fine-tuning allows models to learn from new examples and potentially acquire new knowledge not part of their original training set. In contrast, prompt engineering limits models to the original knowledge and capabilities and cannot introduce new knowledge. In addition, the time involved in crafting different prompts and experimenting with them is a factor as well. Having said all that, the case for prompt engineering versus fine-tuning can depend on several factors. How highly specialized and domain-specific is the task? What are the accuracy requirements? What is the level of customization needed? And more. Once you decide fine-tuning is the way forward for your specific use case and requirement, this video shows how you can fine-tune a general-purpose LLM with your own data using Snowflake. To run fine-tuning in Snowflake, we use Snowflake's Cortex fine-tune function. Cortex fine-tune is a fully managed service which lets us take the general-purpose LLMs and customize them to our specific needs. As mentioned above, fine-tuning a model allows us to train the model so that it can complete new tasks, respond in particular formats, or use specific tones or styles. For example, you can teach the model to use a particular brand voice or to always generate a JSON with a particular set of keys. Another reason that we might want to fine-tune our LLM before deploying it is to achieve a more reliable output formatting that matches the expected or required format. This is great if we want to have consistent email formatting or information that is output in bullet or table form for reports or a specific format such as JSON. Fine-tuning is great because it helps the model generate more consistent response, that is, more consistent with the data we fine-tune it with. Consider an example where we need to extract the condition and intervention from detailed medical notes. We carefully construct the following prompt, shown here, which asks to extract only conditions and interventions from the medical notes. When we ask this of LLM 3.1, the 405 billion parameter model, we get exactly the response we are looking for, containing only the information we need. However, if we need to deploy this extraction for an entire health system, it will be cost prohibitive to use a 405 billion parameter model. Cortex has the full range of sizes for LLMs, so we can try a 1 billion parameter model here, say a LLM 3.2 1 billion model. However, the 1 billion parameter model won't even attempt to do the extraction. To step up, we can also try LLM 3.2 3 billion model. This slightly bigger model is more helpful, but it ends up extracting a lot more information than we need and is too verbose. This is a perfect case for fine tuning because we want to teach the small model to respond in a particular format, just like the large model. Another word for this process of teaching a small model to behave like a large model is distillation. Just like in chemical distillation, where we run a process to separate out the compounds we want from a larger volume of liquid, we distill the capability from a large model to a smaller model so that we get the benefits of the larger model, but at the cost of running a small one. Now, imagine the scenario from earlier where we needed to extract information from medical notes. The large model worked well on our medical notes. This is great for prototyping. In a real-world setting, when you want to extract information from millions of medical notes in production, the cost would blow up. Scenarios like this are a good case for using the fine-tune function to distill functionality of the large model into a small model. Large models are better for high-accuracy, low-volume tasks since they are more expensive to run. For the high-volume tasks, by fine-tuning a smaller model, we can achieve the performance as close to the larger model, while also training it for a specific use case at a much lower cost and latency. With fine-tuning, however, we would incur the fine-tuning expense up front, but we can enjoy the lower ongoing costs, improved performance, and latency that makes it worthwhile. Everyone is happy again! Phew! Let's move on to talk about how the Cortex fine-tune function works. Cortex fine-tune function leverages parameter-efficient fine-tuning, or PEFT, under the hood. Parameter-efficient fine-tuning is a technique that improves the performance of a pre-trained model for a specific task by fine-tuning only a small subset of its parameters. It is a good choice when we need to adapt a model for a domain-specific task while keeping costs and resources low. This technique reduces the number of parameters being tuned and allows you to use examples to adjust the behavior of the model and improve the model's knowledge of domain-specific tasks. Sounds great, right? Let's quickly consider hardware requirements when we are dealing with LLMs. Due to the enormous size, fine-tuning large models, such as the LLAMA 3.1 405-billion model, entail intensive hardware requirements that need innovative approaches to reduce the memory load to train billions of parameters. But because Cortex fine-tune leverages parameter-efficient fine-tuning, the process is much cheaper and faster. Let's quickly touch on the models that Cortex fine-tune supports. There are a number of models from the LLAMA and the Mistral families that range in size. These can be fine-tuned. Note that the models available for fine-tuning will change over time. Some of the models may be available in some regions, but not others. So check the Snowflake documentation for the latest availability. Let's look back at what we covered in this video. We looked at how fine-tuning a pre-trained LLM can accomplish a new task or respond in new styles in a way an out-of-the-box LLM cannot. We also looked at how we can distill the behavior of the large model to a small model, and how it produces responses with the quality of the large model while getting the lower cost benefits of using a small model. We talked about how the ability to tune a model to respond in a specific style or format, such as JSON, can be very useful in certain use cases. We covered how Snowflake Cortex under the hood runs fine-tuning using parameter-efficient fine-tuning technique, and how it reduces compute costs, and why this is much more efficient. In the next video, we will prepare our Snowflake environment to fine-tune a foundational model for a specific use case. See you in the next video!

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.