Can Microsoft explanation for Office 365 and Windows Live outage reassure cloud customers?
The explanation as to why Microsoft suffered an outage of three and a half hours to its Office 365 and Windows Live services on 8 September might not be too reassuring for those people out there tempted to move to the cloud. According to
a blog post
by Arthur de Haan, vice president for Windows Live test and service engineering, it's all the fault of a pesky corrupted file in Microsoft's DNS service.
The file corruption was "a result of two rare conditions occurring at the same time". I won't go into technical detail here but it does seem extraordinary that such a high profile example of cloud computing services should be derailed by such a small thing.
According to de Haan, Microsoft has "identified two streams of work" to improve service around monitoring, problem identification and recovery and is "further hardening the DNS service to improve its overall redundancy and fail-over capability". All well and good but if I was one of Microsoft's customers, I might feel it should have done a bit more work on Office 365 and Windows Live services, such as Hotmail and SkyDrive, before it started promoting them more widely.
From a customer perspective, I'm not sure de Haan's explanation will help assuage concerns about the stability of cloud computing services, especially when the file corruption resulted in a global outage.
If I was a cloud services provider, I might be torn between a positive view of the Microsoft outage as an opportunity to promote rival services and a negative view of the loss of service as an all-too public demonstration of the potential pitfalls of using cloud-based services. And I'd probably start getting used to saying "Yes, it's a cloud computing service, but it's not like Office 365 at all..."
This was first published in September 2011