Archive

Archive for May, 2013

Absolute SLA of Cloud Application – Exclusive Effect

May 17, 2013 1 comment

High availability is one of the most important NFR in application deployment. We are used to measuring the availability of services in nines (three nines, five nines).  In terms of cloud services it is imperative to be AWARE of all the SLAs of different services and accordingly publish the SLA of our application being deployed in the Cloud.

Before I put the point across, just a recap of what high availability SLA downtime and Azure SLAs are. The Service Level Agreement will be calculated based on the percentage of the availability defined for the system.

Following table shows the availability percentage and corresponding maximum possible downtime period within the SLA.

Availability % Downtime per year Downtime per month Downtime per week
90% (“one   nine”) 36.5 days 72 hours 16.8 hours
95% 18.25 days 36 hours 8.4 hours
97% 10.96 days 21.6 hours 5.04 hours
98% 7.30 days 14.4 hours 3.36 hours
99% (“two nines”) 3.65 days 7.20 hours 1.68 hours
99.5% 1.83 days 3.60 hours 50.4 minutes
99.8% 17.52 hours 86.23 minutes 20.16 minutes
99.9% (“three   nines”) 8.76 hours 43.2 minutes 10.1 minutes
99.95% 4.38 hours 21.56 minutes 5.04 minutes
99.99% (“four   nines”) 52.56 minutes 4.32 minutes 1.01 minutes
99.999% (“five nines”) 5.26 minutes 25.9 seconds 6.05 seconds
99.9999% (“six   nines”) 31.5 seconds 2.59 seconds 0.605 seconds
99.99999% (“seven   nines”) 3.15 seconds 0.259 seconds 0.0605 seconds

Considering the Azure services, SLA varies for each component. The following table shows the SLA for each component that Microsoft offer.

Azure Services SLA
Cloud Services 99.95%
Storage 99.9%
SQL Database 99.9%
SQL Reporting 99.9%
Service Bus 99.9%
Access Control 99.9%
Caching 99.9%
CDN 99.9%

Now, let us consider a simple application with few cloud services used to develop. Windows Azure is just one of the providers here due to my familiarity with the platform, but generally the principle is applicable to almost every environment providing hosting.

1

Web Role – 2 instances – Availability 99.95% – Maximum Downtime per month 21.56 minutes
SQL database – Availability 99.9% – Maximum Downtime per month 3.60 hours
Storage – Availability 99.9% – Maximum Downtime per month 3.60 hours

All the above components are critical and should be available for desired usage of the application.

Now, what is the SLA of this application?

We might be under the impression that the application’s SLA would be 99.9% since that’s the lowest possible of all the critical components. Meaning your application has maximum down time of 216Mins in a month.  Think again!! These components have their own SLAs for working independently so it is possible that these services would go down at different time. Let us take the worst scenario (design for failure’s principleJ)

2

The above possibility is within the SLA of the service provider, but for the application the possible down time in a month is 453Mins=7.55Hours which is approximately 99%. Coming down from earlier perceived 3 nines (99.9) to 2 nines (99) (believe it!!)

Please be aware of this absolute high availability of dependent services and the worst-case failure scenarios before committing about your application’s SLA. Kindly note this is applicable to every deployments, irrespective of deployment host as On Premise, Amazon, Azure or any other data center for that matter.

Please remember SLAs are not a promise, they are just a goal. There may be some penalties such as refunds if your service provider fails to meet their SLA and ideally you should also pass the similar consequences to your application users.

It is important to think about absolute SLA of your application’s availability and also

1.      monitor the application for delivered SLA
2.      Provide the SLA breach benefits to the customer
3.     
Handle the failure gracefully (may be another blog post )

Tags – Absolute High Availability, How to calculate high availability, Cloud Service SLA, Application SLA, Calculate Application SLA