Azure Updates: VM technical issues; Sustainability; Cost Management

August 10 2021

CTO Mark Russinovich discussed ways to understand the root cause of technical issues with Azure VMs. In a bid to help customers understand the "why" of technical problems, Microsoft recently shipped a new resource health experience, with enhanced notifications of when VM availability is impacted. He gave an example of how the Root Cause Analysis process works. A group of VMs go offline due to a networking problem, Azure internal monitoring notices the VMs are unreachable and begins redeploying to a new rack. The RCA engine correlates data from the host machine, rack switch, and internal monitoring, correlating these together to determine the root cause of the failure.

Russinovich delved more deeply into detecting downtime, correlation analysis, root cause attribution, and RCA publishing. He wrote:

Identifying and communicating to our customers and partners the root cause of any issues impacting them, is just the beginning. Our customers may need to take these RCAs and share them with their customers and coworkers. We want to build on the work here to make it easier to identify and track resource RCAs, as well as easily share them out. In order to accomplish that, we are working on backend changes to generate unique per-resource and per-downtime tracking IDs that we can expose to you, so that you can easily match downtimes to their RCAs. We are also working on new features to make it easier to email RCAs out, and eventually subscribe to RCAs for your VMs. This will make it possible to sign up for RCAs directly in your inbox after an unavailability event with no additional action needed on your part.

Kristen Hicks, product marketing manager for M&O Azure, shared ways to advance financial services strategy using Azure capabilities around sustainability.

About MSCN Reporter

More about MSCN Reporter