Comparing quantized model performance
Understanding performance improvements isn't just about accuracy. Quantized models often offer faster inference times—a key benefit in deployment scenarios. You'll measure how long it takes for both the original and quantized models to process the test set.
The measure_time()
function has been predefined. It sets the model to evaluation mode, runs a forward pass on all batches in the dataloader, and returns the elapsed time.
Both model
(the original model) and model_quantized
(the quantized version) are preloaded along with test_loader
.
This exercise is part of the course
Scalable AI Models with PyTorch Lightning
Exercise instructions
- Compute inference time for the original and quantized models.
- Print both times rounded to two decimal points.
Hands-on interactive exercise
Have a go at this exercise by completing this sample code.
# Measure inference time of the original model
original_time = measure_time(____)
# Measure inference time of the quantized model
quant_time = measure_time(____)
# Print results
print(f"Original Model Time: {____}s")
print(f"Quantized Model Time: {____}s")