Availability
sonu sharma
Posted on August 28, 2024
Availability is defined as the proportion of time a system is up and serving the traffic. It is defined in terms of percentage. Can also be divided into tiers of 2 nines (99%), 3 nines (99.9%), 4 nines (99.99%), 5 nines (99.999%) and 6 nines (99.9999 %).
Ways to improve Availability
-
Redundancy : It is a way of having backup components which can takeover when primary components fail.
Technique to Add Redundancy
- Server Redundancy: Having multiple instance of the same server helps in distributing traffic across servers, ensuring if one fails other can provide service.
- Database Redundancy: Creating a replica database that can takeover when primary database fails.
- Geographic Redundancy: Distributing resources across multiple geographic locations to solve/mitigate the regional failures
-
Load Balancing: It distributes the incoming traffic across multiple servers to ensure that no single server becomes a bottleneck this improving performance and availability.
Technique to Add Load Balancing
- Hardware Load Balancing: Physical devices that distributes traffic based on preconfigured rules.
- Software Load Balancing: Software solutions that manage traffic distribution. Solutions like HAProxy, Nginx, or cloud-based solution like AWS Elastic Load Balancer.
-
Data Replication: It is a way of copying data to multiple locations either asynchronously or in realtime ensuring data is available even one location fails.
Technique of Data Replication
- Synchronous Replication: Data is replicated in real-time to ensure consistency across location.
- Asynchronous Replication: Date is replicated with delay, which can be more efficient but may result in slight data inconsistencies.
-
Failover Mechanism: Failover mechanism automatically witches to redundant system when a failure detected.
Techniques of Failover Mechanism
- Active-Passive failover mechanism: A primary active component is backed by a passive standby component that takes over upon failure.
- Active-Active failover mechanism: All components are active and share the load. If one fails, remaining components continue to handle the load seamlessly.
-
Monitoring & Alerts: Continuous health monitoring involves checking the status of the system components to detect failures early and trigger alert for immediate action.
Techniques for Monitoring & Alerts
- Heartbeat Signals: Regular signals sent between components to check their status.
- Health Check: Automated scripts or tools that perform regular check on components.
- Alerting systems: Tools like PagerDuty or OpsGenie that notify administrators of any issues.
Best practices for Availability
- Build for failure: Assume that components can go down at any moment and build the required fall back mechanisms
- Implement Health Check
- Use Multiple availability zones: Distribute the system across multiple data centers to prevent localized failures.
- Practice chaos Engineering: Check reliability by intentionally introducing failures.
- Implement Circuit Breakers: Prevent cascading failures by quickly cutting off problematic services
- Use caching wisely: Caching can reduce load on databases.
- Plan for capacity: Ensure your system can handle both expected and unexpected loads.
Posted on August 28, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.