Availability Service Level Calculation
Steven Gonsalvez
Posted on July 6, 2022
Calculating the composite availability SLA for your stack
The following guide below will help you calculate your own availability.
Two parts to the guide according to what you are after.
- The actuarial science behind the calculation(which is just probability of "something" being available or unavailable)
- SLA Calculation guide to maximum downtime possible.
The Actuarial science
The calculation of service levels is purely to assess the risk or the probability of failure and taken as a mathematical problem.
Suggestion: Skip this if purely interested in just availability percentages. Go here
Let us consider the sample space of the following detail.
SLA sumary for Azure services taken independently
- Azure DNS: 100% availability (so will remove from consideration in this problem, as will not skew the calculation)
- Azure Front door : 99.99% availability or 0.0001 probability of going down
- Azure App service: 99.95% availability or 0.0005 probability of going down
Note: Although App service is declared with an SLA of 99.95%, with the GA of zonal redundancy that should increase to 99.99% - but that has not been documented yet. For the case of this will be using as described here
Sample spaces for the probability:
Mutually exclusive events
- App service Region 1(AR1) is down but Azure Front door (FD is up)
- App service Region 2(AR2) is down but FD is up
Independent events
- AR1 and AR2 is down
- Azure Front door(FD) is down
- FD is down or AR1 and AR2 is down
For the Mutually exclusive events that either AR1 or AR2 is down, but not both simultaneously
There by the probability of unavailability is 0 for the mutually exclusive events both occuring
For the Mutually exclusive events , then probability of either occuring
calculating that as values
- Probability of AR1 to be down : 0.0005
- Probability of AR1 to be down : 0.0005
Probability of either to be down:
Calculating the probability of only operating on a single region
Two independent events
- Azure Front door being available = 1 - 0.0001 = 0.9999
- Either of AR1 or AR2 being available(AR1|AR2): 1 - 0.001 = 0.999
Overall probability of only being operational on a single region
In percentage = 99.89001%.
Overall availability/unavailability
Overall unavailability is the scenario FD is down or (AR1 and AR2) is down
- AR1 and AR2 are down as independent events AR1||AR2
-
FD is down as a independent event from AR1 and AR2 being down as independent events AR1||AR2
-
FD is down as a mutually exclusive event from AR1 and AR2 being down as independent events, but either can occur
Overall probability of availability = 1 - 0.00010025 = 0.99989975
In percentage: availability = 99.989975%
Calculating your downtime or availability percentages
The simplified calculation below just uses probability rules described above to calculate the compound availability of the stack.
Note: A few examples are given below to demon
Stack for a stateless web application
SLA calculation guide for the following detail:
SLA summary for Azure services taken independently
- Akamai : 99.999%
- Azure DNS: 100% availability (so will remove from consideration in this problem, as will not skew the calculation)
- Azure Front door : 99.99% availability or 0.0001 probability of going down
- Azure App service: 99.95% availability or 0.0005 probability of going down
Azure App service across both regions being down as independent events simultaneously
So availability: 99.999975%
Either of Akamai OR Azure Frontdoor Or Azure App service across both regions being down
The overall SLA of the stack is
99.9889%
Stack for a stateless web application through a private link with regional Redis cache
SLA calculation guide for the following detail:
SLA summary for Azure services taken independently
- Akamai : 99.999%.(This could well be 100% - something to validate contractually)
- Azure DNS: 100% availability (so will remove from consideration in this problem, as will not skew the calculation)
- Azure Front door : 99.99% availability or 0.0001 probability of going down
- Azure App service: 99.95% availability or 0.0005 probability of going down
- Azure private link: 99.99% availability or 0.0001 probability of going down
- Azure Redis (individual region - for any Standard): 99.9% or 0.001 probability of going down
Although considering Redis being used as a cache (read/write through) and should not "really" affect the SLA, we would consider it technically as part of this calculation demonstration.
Composite Availability of App Service and Redis within a region (inclusive of private link)
unavailability of a region : 0.16% (100 - 99.84)
Unavailability of two regions of App Service, private link and Redis.
Compound Availability of App service and Redis over two regions: 99.999744%
Compound availability of the stack (Akamai * Frontdoor * ( (appservice + redis)both regions) ))
The overall SLA of the stack is
99.9887%
Follow the approach as in the above examples to calculate the composite availability of the stack you deploy appropriate to the configuration (eg: types of instances will have different SLAs premium vs standard)
Downtime calculation.
- For a 24 hour period, the maximum allowed downtime(error budget) for an availability of
99.9887%
is 9.76 seconds $((100-99.9887)/100 * 24 * 3600))$ - For a month, the maximum allowed downtime is
~ 5 minutes
Posted on July 6, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.