MulaiMulai sekarang secara gratis

Loading 8-bit models

Your company has been using a Llama model for their customer service chatbot for a while now. You've been tasked with figuring out how to reduce the model's GPU memory usage without significantly affecting performance. This will allow the team to switch to a cheaper compute cluster and save the company a lot of money.

You decide to test if you can load your model with 8-bit quantization and maintain a reasonable performance.

You are given the model in model_name. AutoModelForCausalLM and AutoTokenizer are already imported for you.

Latihan ini adalah bagian dari kursus

Fine-Tuning with Llama 3

Lihat Kursus

Petunjuk latihan

  • Import the configuration class to enable loading of models with quantization.
  • Instantiate the quantization configuration class.
  • Configure the quantization parameters to load the model in 8-bit.
  • Pass quantization configuration to AutoModelForCausalLM to load the quantized model.

Latihan interaktif praktis

Ubah teori menjadi tindakan dengan salah satu latihan interaktif kami.

Mulai berolahraga