It seems that a lightning strike has disturbed some of the key cloud services at Amazon’s only EU data center operating from Ireland and Amazon has warned some of those affected may face delays before they get back online. ZD Net has posted more info. on this a few hours ago.
Below is a screenshot of Amazon’s EU service status that I took, as I am writing now
As you can see RDS and EC2 are experiencing issues.
One of the key issues with this kind of an outage is that the European users donot have alternate regions to fall back on Amazon aws outside of EU as they have to comply with the data protection directives in the EU. Amazon’s EU region has only one data-center.
An incident such as this would be a wakeup call for users running mission critical services on the cloud in the European region. Its perhaps high time to better strategize,plan and implement their disaster recovery and business continuity. Imagine one losing a large part of a critical data or not able to service transactions for days!
Few things I can think as of now/ suggest to mitigate from a disaster such as this
- Realtime Monitoring and reporting of your services in the cloud is critical.(Especially a from a service outside the cloud through agents. E.g AppFirst,Monitis)
- Plan to have your critical data backed up /synced with another cloud service provider such as CloudSigma which offers True IaaS from the heart of the EU region with excellent performance
- Run a Data replication for your select (MC) database servers between two clouds
- You may have a low capacity set up replicating your original production set up deployed in another cloud like CloudSigma and when the Disaster strikes ,scale up the servers for DR (Cloudsigma offers Vertical scaling) until your main production servers are back to work.
- Cloudsigma also has an option where you can create a server ,make all software installations and then stop it. In this mode you only pay for the disk/data store and you can start the service automatically through their APIs based on your monitoring system’s status and then redirect all your network flows to the backup server.
- Rackspace CloudFiles or Nirvanix SDN storage nodes may be a good option to explore backing up data if they have dedicated services in the EU region,so that compliance needs are taken care of.
- Rackspace cloud servers as a backup service option
- Explore Hybrid Cloud options for an enterprise scenario which means you only scale out to the public cloud when needed and dont put all your chickens in one basket
- Explore using multi-cloud application level deployment and cloud management software like Kaavo For instance you can deploy identical application instances on Amazon aws as well as on Rackspace and then load balance it through Kaavo. The only caveat here is Kaavo operates from the US as I understand. EU access and data Compliance in such a case may be a dependency.
- Implement relevant people and process elements combined with the technology framework to mitigate problems such as this.For instance ,having support staff in different time zones (.e.g : EU people ending their workday and people starting their day in the US now)
A number of such things can be planned and implemented .My few cents before I go to bed in the next 15 mins at 12.30 am ….-:) .
To Summarize, this incident cloud act as a wakeup call for the European companies using Amazon AWS or another Public cloud to plan and implement predictible disaster recovery and business continuity in their business. Risk management on the Cloud should be a serious CEO and CIO level plan to be reviewed by a management with actionable plans,implementation and audits in an ongoing fashion.
Note : CloudSigma is a public IaaS cloud service in the European region worth looking at purely based on my experience of using it and hence my citing here. Rackspace’ UK cloud service is another great option to explore.