Never confuse a single defeat with a final defeat.
– F. Scott Fitzgerald
So last week was, depending on who you ask, a terrible week for "The Cloud," a wakeup call for businesses who want to use cloud services, or nothing all that surprising.
In case you have only just escaped from alien abductors and missed the news, Amazon's AWS had a service interruption starting early on April 21st. (Although technically the problem was in EBS, the elastic block storage technology that can be used to provide storage to EC2 instances, rather than AWS itself.)
It was, basically, a combination of an unfortunate event at Amazon (which can happen in the best managed data center), followed by some, as yet unexplained cascade effect that impacted performance in other availability zones, and compounded by some allegedly risky choices from Amazon’s customers.
It is, in short, a pretty standard business technology availability event, writ large simply because Amazon is so visible as a leading light in the push to The Cloud.
There's plenty of great analysis of what happened already out there, and there's no point in repeating it here. I definitely recommend the blog of one of the 'survivors' of the event, Don MacAskill at SmugMug for more info.
Technology specifics aside, I do think there are a couple of interesting underlying elements to this story that go far beyond the temporary loss of EBS availability, and apply directly to security practices for businesses thinking of using cloud services.
The first is, obviously, don't forget that "availability" is still something you have to worry about, even in the cloud (and even if you're working with the biggest cloud provider on the block).
More interestingly though, a lot of comments have been made regarding the equal culpability of Amazon's customers. Putting it bluntly, the commentary goes, if the customers were more careful about the way they used the services Amazon offered, they wouldn't have had the problem. Without getting into hurling any rocks here, I think the bigger picture is this – just because it's a cloud, it doesn't mean your troubles are over. Due diligence, disaster planning, business continuity plans, risk analysis, all the things that you have to do for your own infrastructure apply just as much (perhaps more than ever) when using a cloud service. Worse, the approaches that you used when the infrastructure was your own may no longer apply when you're using someone else's cloud services. In fact, it may require some radical re-thinking of how you approach the management and availability of your systems.
And guess what? That applies to security too. Cloud changes the way organizations will interact with the IT resources and the processes they need to perform their business. And it's clear it requires them to think carefully about those changes.
Taking existing processes and grafting them on to the new world of cloud may not work here. (I think there's a good biblical quote about new wine in old skins that fits well.) The same is true for security. Things that worked just fine when you had physical control of systems, and full visibility to everything happening on them, may no longer apply.
There's also a point here to be made about the accumulation of large amounts of services and data into centralized cloud providers. The number of organizations impacted by Amazon's problem shows how pervasive use of their solutions are (clearly a good thing for Amazon.) However, it changes the risk to organizations in a way that is harder to predict. No, I'm not about to launch into a rant about ‘Cloud’ being somehow 'insecure.' Rather, I think the lesson here might be: What effect will a targeted, sophisticated attack on a cloud provider have to the business processes of so many organizations?
The question that needs to be considered is this – as I heap all those eggs into fewer (but arguably better managed) baskets, am I gaining more risk than I am losing, or vice versa? Collateral breaches may be something we have to starting thinking seriously about before too long, and businesses might want to start asking exactly how valuable is the data sitting in the same set of services as theirs.
The good news is that these little problems (and in reality, it wasn't the end of the world, despite claims to the contrary) offer opportunities to learn and perfect the practice before major problems arise. Because, if there's nothing else we learn from the problems of April 21st, arise they will.
As Malcolm Forbes once said, "Failure is success if we learn from it."
OK, next time I promise to be back to BitLocker and, as advertised, some advice on the best (and worst) places to deploy it.