Today we experienced a service disruption, this is a rare event and we apologize for any inconvenience. Just like yourself we depend on this system and we require it to be up 27-7 too.
Here is a what happened. We host all of our servers with Amazon.com, we have found that they have the highest availability of any hosting provider and if there is an issue they fix it promptly. If the servers are good enough for Amazon.com, one of the highest trafficted sites in the world, it is good enough for us. This morning Amazon.com experienced an outage that also brought down our servers. As soon as we saw the issue we moved to our backup servers, no data was lost or damaged. We are still operating off of our backup servers and we hope to move to our faster production servers as soon as we finish our testing.
Here is the breakdown of what Amazon.com experienced (based off of their status page: http://status.aws.amazon.com/)
1:41 AM PDT We are currently investigating latency and error rates with EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region.
2:18 AM PDT We can confirm connectivity errors impacting EC2 instances and increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region. Increased error rates are affecting EBS CreateVolume API calls. We continue to work towards resolution.
2:49 AM PDT We are continuing to see connectivity errors impacting EC2 instances, increased latencies impacting EBS volumes in multiple availability zones in the US-EAST-1 region, and increased error rates affecting EBS CreateVolume API calls. We are also experiencing delayed launches for EBS backed EC2 instances in affected availability zones in the US-EAST-1 region. We continue to work towards resolution.
3:20 AM PDT Delayed EC2 instance launches and EBS API error rates are recovering. We're continuing to work towards full resolution.
4:09 AM PDT EBS volume latency and API errors have recovered in one of the two impacted Availability Zones in US-EAST-1. We are continuing to work to resolve the issues in the second impacted Availability Zone. The errors, which started at 12:55AM PDT, began recovering at 2:55am PDT
5:02 AM PDT Latency has recovered for a portion of the impacted EBS volumes. We are continuing to work to resolve the remaining issues with EBS volume latency and error rates in a single Availability Zone.
6:09 AM PDT EBS API errors and volume latencies in the affected availability zone remain. We are continuing to work towards resolution.
6:59 AM PDT There has been a moderate increase in error rates for CreateVolume. This may impact the launch of new EBS-backed EC2 instances in multiple availability zones in the US-EAST-1 region. Launches of instance store AMIs are currently unaffected. We are continuing to work on resolving this issue.
We hope this sheds light on what happened, if you have any specific questions please let us know.
-Mike