Comparing quantized model performance
Understanding performance improvements isn't just about accuracy. Quantized models often offer faster inference times—a key benefit in deployment scenarios. You'll measure how long it takes for both the original and quantized models to process the test set.
The measure_time() function has been predefined. It sets the model to evaluation mode, runs a forward pass on all batches in the dataloader, and returns the elapsed time.
Both model (the original model) and model_quantized (the quantized version) are preloaded along with test_loader.
Latihan ini adalah bagian dari kursus
Scalable AI Models with PyTorch Lightning
Petunjuk latihan
- Compute inference time for the original and quantized models.
- Print both times rounded to two decimal points.
Latihan interaktif praktis
Cobalah latihan ini dengan menyelesaikan kode contoh berikut.
# Measure inference time of the original model
original_time = measure_time(____)
# Measure inference time of the quantized model
quant_time = measure_time(____)
# Print results
print(f"Original Model Time: {____}s")
print(f"Quantized Model Time: {____}s")