Get startedGet started for free

Monitoring and observability

1. Monitoring and observability

After deploying our application, we need to ensure everything is working as expected.

2. LLM lifecycle: Monitoring and observability

Monitoring and observability are crucial practices for ensuring the smooth operation of systems, including LLM applications.

3. Monitoring and observability

They are often discussed together, but serve distinct roles. Monitoring continuously watches a system behavior for performance changes. Observability reveals a system's internal state to external observers, using data from all components to understand their interactions. This enables us to answer unforeseen questions prompted by unexpected events, like a sudden surge in traffic or a database outage. To enable observability, we can utilize three primary data sources: logs, metrics, and traces. Logs provide detailed chronological event records, metrics offer quantitative system performance measurements, and traces show the flow of requests across system components. Each helps in understanding and troubleshooting system behavior. Now let's shift focus to actively monitoring our application. We need to consider what aspects to monitor, categorizing them into input, functional, and output monitoring.

4. Input monitoring

Input monitoring involves tracking changes, errors, or malicious content in application inputs, particularly relevant in LLM applications with human-generated text input. Detecting malicious input by comparing user inputs with known adversarial prompts is essential, a topic we'll explore later in this chapter. Data drift is the change in input data distribution over time, impacting application performance. It can result from environmental changes, user behavior, or data source alterations. Addressing data drift requires monitoring the data distribution and periodically updating the model.

5. Functional monitoring

Functional monitoring entails monitoring an application's overall health, performance, and stability. This encompasses tracking metrics like response time, request volume, downtime, and error rates. When dealing with chains and agents, it's important to acknowledge their unpredictable executions and their potential to involve multiple calls to LLMs. Therefore, it's beneficial to monitor these calls. For LLMs, it's vital to monitor system resources such as memory and GPU usage. One specific aspect worth noting is tracking costs, which we will explore in the next video.

6. Output monitoring

The last form of monitoring is output monitoring. This involves assessing the responses an application generates to ensure they match the expected content. This assessment relies on the primary and secondary metrics defined during testing, with unsupervised metrics like bias, toxicity, and helpfulness being particularly useful. Of importance is the concept of model drift, not be confused with data drift. Model drift is when the model gets worse because the relationship between input and output changes due to external factors. Implementing feedback loops, like refining the application using the latest data, can mitigate this issue. Remember that large language models make errors, some of which may have negative consequences for our organization. Towards the end of this chapter, we'll cover security and governance, including the idea of censoring, which involves actively intervening rather than just monitoring.

7. Alert handling

Once we've set up monitoring, the next step is alerting. Alerting ensures that we're promptly notified when issues arise. It's essential to anticipate and prepare for potential problems, threats, and failures by establishing clear procedures. Depending on the maturity of our organization and application, we may have service-level agreements in place to define response times and responsibilities in case of issues.

8. Let's practice!

Now that we understand the basic concepts of monitoring and observability, let's put this knowledge into practice.