Amazon Web Services Outage Cripples Popular Sites

Post meta

Posted on
October 22nd, 2012 at 15:02
Posted by
jaysudowski

Today popular web sites were crippled due to an outage on Amazon’s Web Services.  Sites impacted include:

  • Reddit OutageReddit
  • NetFlix
  • GitHub
  • MineCraft

Yes, as a consumer of some of these services, the outage is annoying.  But, what lessons can we learn if we look at this outage as from the perspective of the site/organization that is suffering the outage:

  1. Outages are inevitable.  No one wants to readily admit it, but outages are inevitable.  Even in the most robustly engineered solution, with redundancies from the physical infrastructure up to the application layer level, you will experience an outage.  Many people tend to think of outages as something that should be avoided at all costs, even when they have very, very small budgets.  The question that you should be asking yourself, your co-workers and your customers is how can you still work in a degraded mode, while an outage is on-going?  Even small organizations can have degraded operations during an outage by following some simple methods:
    • Split your services you offer across multiple logical systems.  For instance, host unique applications on unique logical systems — email on one server, website on another, database on a third.
    • Distribute your apps across multiple sites and data centers.  Host your DNS with a managed DNS provider, host your email at one site, host your database at another site.
    • Speaking of managed DNS providers, many managed DNS providers will automatically detect an outage on your web site and automatically redirect your web site to a secondary IP.  Depending on your business requirements, it may acceptable to have a web site hosted at an alternative site that offers limited functionality compared to your main site, but still keeps you connected with your customers.
    • Partner with hosting providers, managed service providers, Infrastructure-as-a-Service providers who are customer focused and offer a high level of support and communication during an outage.
    • Disaster recovery used to be something only huge companies could afford. Microsoft HyperV 2012 and VMWare are both making DR accessible to even very small business, with their ability to now replicate individual virtual machines to Disaster Recovery sites without requiring expensive SANs and dedicated, private bandwidth connections between sites for replication.
  2. There is no such thing as ‘the cloud’.  It means everything and nothing.  I have modified the below text to remove ‘cloud’ with highly available, virtualized cluster.
  3. Organizations that are using a public cloud highly available, virtualized cluster today should consider using a private cloud highly available, virtualized cluster or a hybrid cloud highly available, virtualized cluster model going forward.  There is nothing wrong with the AWS model or cloud highly available, virtualized cluster hosting model, in general.  But, AWS is a huge public cloud highly available, virtualized cluster and as such when it has issues, it has major, major issues.  Let me be precise in my definition of a private cloud highly available, virtualized cluster — a private cloud highly available, virtualized cluster is a virtualized, highly available computing environment that is exclusively dedicated to the needs and requirements of a single organization.  In such an environment, you can buy the raw capacity required for your business operations, partition those resources as you need, re-partition them on the fly.  The downside to a private cloud highly available, virtualized cluster is that you lose the flexibility that a huge, public cloud highly available, virtualized cluster has to offer.  For example, if you need thousands of virtual instances for very short periods of time, a private cloud highly available, virtualized cluster may be far more costly than using public cloud highly available, virtualized cluster resources.  However, if you are using persistent public cloud highly available, virtualized cluster virtual instances that stay on 24×7, moving to a private cloud highly available, virtualized cluster can often give you increased reliability, moderate amounts of flexibility and scalability and increased stability and performance.
  4. Become informed.  If you are a small to medium-sized business whose entire business revolves around the internet, it is critical that you have informed, clueful technologists on your team who can give you competent advice, outage impact modeling and outage mitigation strategies.  Remember, outages are inevitable.

We happen to have extensive experience helping folks setup private cloud highly available, virtualized clusters, whether for truly private usage or to offer public cloud highly available, virtualized cluster hosting services to their end users. We’d love to talk to you about implementing a private cloud highly available, virtualized cluster solution, whether it’s onApp, VMWare, HyperV, OpenNebula, OpenQRM, ProxMox. Please live chat or call us at 303-414-6910 today.