Automated monitoring with alerts

1. Automated monitoring with alerts

Welcome back! In the last video, we learned how to collect and view metrics.

2. Alerts

But constantly watching dashboards is like sitting in front of security cameras all day, it's not practical.

3. Alerts

What you need is a smart alarm system that watches for you. That's exactly what Azure Monitor alerts do.

4. What are Alerts?

Think of alerts like a home security system. You don't watch for intruders all day, sensors trigger alarms when conditions are met. Azure alerts work the same way. An alert rule defines what to watch, when to notify, and what to do. They work continuously, so if your storage account goes down at 3 AM, you're notified immediately, not when you check your dashboard the next morning.

5. Alert rules explained

Let's look at a storage account example. You create an alert rule monitoring availability. The signal is the metric you watch.

6. Alert rules explained

The condition defines when to alert, "when average availability drops below 99%." Evaluation frequency is how often Azure checks, every 5 minutes.

7. Alert rules explained

Alerts are stateful, like a smoke alarm. When availability drops, you're notified. When it recovers, you get a "resolved" notification automatically.

8. Action groups: Your response team

Action groups are your emergency response team. When a fire alarm sounds, it doesn't just make noise, it calls the fire department, activates sprinklers, and lights exits. Similarly, when an alert fires, the action group executes responses. You can send emails, trigger automation, or create tickets. Here's the power: action groups are reusable. Create one called "Operations Team" and use it for availability alerts, transaction alerts, and error alerts, one response team for multiple emergencies.

9. Put into practice

Here's the complete picture. You manage a storage account with two concerns: availability and cost. You create two alert rules using the same action group.

10. Put into practice

At 2 AM, a network issue drops availability to 95%. Within 5 minutes, your on-call engineer gets an email.

11. Put into practice

The issue resolves by 2:30 AM, and Azure sends a "resolved" notification.

12. Put into practice

Meanwhile, if someone misconfigures an app causing transaction spikes, you get a separate alert about cost catching the issue before month-end billing surprises.

13. Lookback period

Think of a security guard doing rounds. Lookback period is how far back they check, "any activity in the last 15 minutes?"

14. Evaluation frequency

Evaluation frequency is how often they check, "every 5 minutes." With a 15-minute lookback and 5-minute evaluation, windows overlap. If a transaction spike occurs at 2:03 PM, you're notified by 2:05 PM, not 2:15 PM. This overlapping ensures faster detection without alert spam.

15. Best practices

Here are five practices for success. First, start with critical metrics, availability and errors, then expand. Don't create 50 alerts on day one. Second, set realistic thresholds by learning your baseline. Third, use severity levels wisely, reserve 3 AM pages for critical outages, not informational events.

16. Best practices

Fourth, create action groups by team responsibility. Finally, test your alerts. Generate test conditions and verify emails arrive. Don't discover your notifications go to a dead mailbox during an actual crisis.

17. Let's practice!

Let's jump in and create Alerts!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.