Comparing quantized model performance
Understanding performance improvements isn't just about accuracy. Quantized models often offer faster inference times—a key benefit in deployment scenarios. You'll measure how long it takes for both the original and quantized models to process the test set.
The measure_time() function has been predefined. It sets the model to evaluation mode, runs a forward pass on all batches in the dataloader, and returns the elapsed time.
Both model (the original model) and model_quantized (the quantized version) are preloaded along with test_loader.
Deze oefening maakt deel uit van de cursus
Scalable AI Models with PyTorch Lightning
Oefeninstructies
- Compute inference time for the original and quantized models.
- Print both times rounded to two decimal points.
Praktische interactieve oefening
Probeer deze oefening eens door deze voorbeeldcode in te vullen.
# Measure inference time of the original model
original_time = measure_time(____)
# Measure inference time of the quantized model
quant_time = measure_time(____)
# Print results
print(f"Original Model Time: {____}s")
print(f"Quantized Model Time: {____}s")