Best practices

1. Best practices

Welcome back! Before we wrap up the course, let's explore best practices for taking a forecasting process into production.

2. Reproducible environment

While deploying to the cloud or remote servers is beyond this course's scope, several environment-related practices deserve attention. First, reproducibility — a core principle in data science. Developing pipelines using containers ensures production runs using the exact same environment where you developed and tested your code. Imagine shipping a recipe to a restaurant — containers guarantee they have the same ingredients, tools, and cooking conditions you used when perfecting the dish.

3. Deployment as code

Next, embrace development as code. Automate deployments with scripts and configuration files, avoiding hard-coded parameters and settings. For example, store API endpoints, refresh windows, and model thresholds in configuration files, not buried in code. This makes deployments consistent, portable, and maintainable.

4. Staging vs production

Always maintain a staging environment for testing new features, debugging, and pipeline changes. This enables validation without affecting the production pipeline. A typical setup includes development for building, staging for testing, and production for live operations.

5. Prototyping

When using orchestration tools like Airflow, prototype and test your code in notebooks or scripts first, then migrate to Airflow. Debugging errors through the Airflow GUI can be challenging and time-consuming. Get your logic working locally before adding orchestration complexity.

6. Agnostic design

In this course, we focused on forecasting a single time series. But real-world scenarios often involve dozens, hundreds, or thousands of series. To scale effectively, your code should be agnostic - simple and efficient to move from single to multiple time series.

7. Unified templates

For instance, instead of writing custom code for specific use cases such as the demand for electricity, create a template or module that can be leveraged and handle various use cases through configuration changes rather than code rewrites.

8. Infrastructure planning

Planning and optimizing infrastructure becomes critical when scaling. Balance performance and cost when allocating compute resources. Tools like Kubernetes dynamically scale workloads for optimal efficiency - spinning up resources during peak processing and scaling down during quiet periods.

9. Forecast post-mortem

Once forecasts are in production, continuous improvement becomes essential. Build a post-mortem framework to analyze performance systematically. When a model drifts or fails, document what happened, why it occurred, and how you fixed it. Use dashboards and reports to compare results across time series and spot improvement areas. Document findings so lessons learned feed into future iterations. Create a knowledge base and record what works best for different series types, common drift patterns, and potential issues to expect.

10. Let's practice!

Remember, production deployment isn't the finish line - it's the starting point for continuous refinement and scaling. Time for your final practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.