ALZ Monitoring Alerts Process
This page is a guide for ALZ Engineers to review and act on the Azure monitoring alerts within Azure Landing zone teams channel. Alerts are only generated for the Spokes we support end-to-end: Prod Hub and Prod Shared Services.
ALZ Monitoring Rota
As an ALZ engineer you will use the current OOH rota (provided by service level management) as a guide for who will monitor the Azure Monitoring alerts channel in the day time. Please note the rota needs to be updated:
- When Providing Holiday or Sickness cover
- When we have new starter\leaver
- When it’s start of a new year
Below is a snapshot of the weekly rota:
Week 1 | Week 2 | Week 3 | Week 4 |
---|---|---|---|
Alan | Ravi | Noor | Kokulan |
08/01/2024 | 15/01/2024 | 22/01/2024 | 29/01/2024 |
05/02/2024 | 12/02/2024 | 19/02/2024 | 26/02/2024 |
04/03/2024 | 11/03/2024 | 18/03/2024 | 25/03/2024 |
01/04/2024 | 08/04/2024 | 15/04/2024 | 22/04/2024 |
29/04/2024 | 06/05/2024 | 13/05/2024 | 20/05/2024 |
27/05/2024 | 03/06/2024 | 10/06/2024 | 17/06/2024 |
24/06/2024 | 01/07/2024 | 08/07/2024 | 15/07/2024 |
22/07/2024 | ———- | 29/07/2024 | ———- |
———- | 05/08/2024 | ———- | ———- |
12/08/2024 | ———- | ———- | 19/08/2024 |
———- | 26/08/2024 | 02/09/2024 | 09/09/2024 |
16/09/2024 | 23/09/2024 | 30/09/2024 | 07/10/2024 |
14/10/2024 | 21/10/2024 | 28/10/2024 | 04/11/2024 |
11/11/2024 | ———- | 18/11/2024 | ———- |
———- | 25/11/2024 | ———- | 02/12/2024 |
———- | 09/12/2024 | ———- | ———- |
16/12/2024 | ———- | 23/12/2024 | 30/12/2024 |
06/01/2025 | 13/01/2025 | 20/01/2025 | 27/01/2025 |
Alert Notifications
We have configured Azure Monitor to generate alerts within our Monitoring Alerts
MS Teams channel.
Teams Channel:
Example Alert:
Reviewing Alert Notifications
ALZ Engineers will be responsible for reviewing the Alerts within the Monitoring Alerts
channel each day. When reviewing the alert the engineer needs to determine whether it’s an Incident or not:
- A guide to our Incident Types.
- If we think it’s an Incident please follow the Incident Management Process.
If the Alert is not severe enough to be an incident we should at least follow Moderate Incident type as guide to a less involved process.
All alerts must be acknowledged by an ALZ Engineer Regardless a incident or not to show that the alert as been reviewed and then resolved. For now, this can be done by just replying to the Alert message that is created in the Teams chat (There will be slicker process around this in the future)
ALZ Incident Management
ALZ Incident Management Approach