LoslegenKostenlos loslegen

Loading 8-bit models

Your company has been using a Llama model for their customer service chatbot for a while now. You've been tasked with figuring out how to reduce the model's GPU memory usage without significantly affecting performance. This will allow the team to switch to a cheaper compute cluster and save the company a lot of money.

You decide to test if you can load your model with 8-bit quantization and maintain a reasonable performance.

You are given the model in model_name. AutoModelForCausalLM and AutoTokenizer are already imported for you.

Diese Übung ist Teil des Kurses

Fine-Tuning with Llama 3

Kurs anzeigen

Anleitung zur Übung

  • Import the configuration class to enable loading of models with quantization.
  • Instantiate the quantization configuration class.
  • Configure the quantization parameters to load the model in 8-bit.
  • Pass quantization configuration to AutoModelForCausalLM to load the quantized model.

Interaktive Übung

Setze die Theorie in einer unserer interaktiven Übungen in die Praxis um

Übung starten