Demystifying Load Balancers

modernsystemdesign

Nrapesh Khamesra

Posted on May 18, 2023

Demystifying Load Balancers

This is the 3rd article in the Demystifying System Design interview series where I’ll go into details about what is Load balancer and how it works.

Millions of requests could arrive per second in a typical data center. To serve these requests, thousands (or a hundred thousand) servers work together to share the load of incoming requests.

The load balancer (LB) plays a crucial role in distributing clients' requests among the available pool of servers to avoid server overload or crashes. As the first point of contact within a data center after the firewall, load balancers are not necessary for services receiving a few hundred or thousand requests per second. However, they become essential for managing increasing client requests by providing the following capabilities:

  • Scalability: Load balancers facilitate seamless upscaling or downscaling of the application/service capacity by adding or removing servers, which remains transparent to end-users.
  • Availability: Even if some servers go down or encounter faults, load balancers can hide these failures, ensuring the system remains available.
  • Performance: Load balancers can direct requests to servers with lower loads, improving response times and optimizing resource utilization, thus enhancing performance.

Where to place load balancers?

Load balancers (LBs) typically operate between clients and servers, with requests passing through the load balancing layer to the servers and back to the clients. However, load balancers can also be used in other scenarios, such as when managing the three main groups of servers: web, application, and database servers.
In these cases, load balancers can be strategically placed between server instances of the three services to distribute the traffic load more efficiently. For example:

  • LBs can be placed between end-users of the application and web servers/application gateways.

  • LBs can be placed between web servers and application servers responsible for running the business/application logic.

  • LBs can be placed between application servers and database servers.

What do Load balancers do?

Load balancers (LBs) not only facilitate scalability, availability, and high performance, but they also offer a range of additional services, including:

  • Health checking: LBs use the heartbeat protocol to monitor the health and reliability of end-servers, which improves the user experience.
  • TLS termination: LBs reduce the workload on end-servers by handling TLS termination with the client.
  • Predictive analytics: LBs can predict traffic patterns through analytics performed over traffic passing through them or using statistics of traffic obtained over time.
  • Reduced human intervention: LB automation minimizes the need for system administration efforts in handling failures.
  • Service discovery: LBs forward clients' requests to appropriate hosting servers by inquiring about the service registry, improving efficiency.
  • Security: LBs can enhance security by mitigating attacks like denial-of-service (DoS) at different layers of the OSI model (layers 3, 4, and 7).

Overall, load balancers provide flexibility, reliability, redundancy, and efficiency to the system design.

Global server load balancing

Global server load balancing (GSLB) intelligently distributes globally arriving traffic load to different data centers based on factors such as users' geographic locations, the number of hosting servers in different locations, and the health of data centers. For example, in the event of a power or network failure in a data center, GSLB can reroute all traffic to another data center. GSLB service can be installed on-premises or obtained through Load Balancing as a Service (LBaaS).

What is local load balancing?

Local load balancers are situated inside a data center and act as a reverse proxy, distributing incoming requests among the available servers. They use a virtual IP address (VIP) to seamlessly connect incoming client requests to the load balancer.

Load Balancing Algorithms

Load balancers use various algorithms to distribute client requests among the available servers. Below are some well-known algorithms:

  • Round-robin scheduling: Requests are forwarded to the servers in a pool in a repeating sequential manner.

  • Weighted round-robin: Servers with higher capacity are assigned higher weights, and LBs forward client requests based on the weight of the server.

  • Least connections: Newer arriving requests are assigned to servers with fewer existing connections. LBs maintain a state of the number and mapping of existing connections.
    Least response time: The server with the least response time is assigned to serve the clients.

  • IP hash: Clients' requests are assigned to servers based on hashing their IP addresses.

  • URL hash: Clients' requests for specific services are assigned to a certain cluster or set of servers based on hashing the URL.

There are also other algorithms like randomized or weighted least connections algorithms.

Load balancing algorithms can be divided into two categories: static and dynamic.
Static algorithms are designed based on existing knowledge about the server's configuration and do not consider the changing state of the servers. They are relatively simple and can be implemented in a single router or commodity machine where all requests arrive.
Dynamic algorithms, on the other hand, consider the current or recent state of the servers. They require state maintenance, which involves communicating with the servers and exchanging information among different load balancing servers. This adds complexity to the algorithm but leads to improved forwarding decisions. Dynamic algorithms can be modular, as no single entity makes the decision. They also monitor the health of the servers and forward requests only to active servers.

Stateful versus stateless Load Balancers

Load balancers can maintain session information between clients and hosting servers through two methods: stateful and stateless.
Stateful load balancing involves the LB maintaining a state of the sessions established between clients and hosting servers. This state information is incorporated into the LB's algorithm to perform load balancing. Stateful LBs keep a data structure that maps incoming clients to hosting servers, which increases complexity and limits scalability. All the load balancers share their state information with each other to make forwarding decisions.
On the other hand, stateless load balancing maintains no session state and is faster and lightweight compared to stateful load balancing. Stateless LBs use consistent hashing to make forwarding decisions, but they may not be as resilient as stateful LBs if infrastructure changes, such as the addition of a new application server. In such cases, a local state may still be required along with consistent hashing to route requests to the correct application server.

Types of Load Balancers

Load balancing can be performed at different levels of the open systems interconnection (OSI) layers, namely, the network/transport and application layer, depending on the requirements.
Layer 4 load balancers are responsible for load balancing at the transport protocol level, such as TCP and UDP. They establish and maintain connections/sessions with clients and ensure that the same communication is forwarded to the same back-end server. Some layer 4 LBs also support TLS termination, although it's typically performed at the layer 7 level.
Layer 7 load balancers operate at the application layer and make forwarding decisions based on application-specific data, such as HTTP headers, URLs, and cookies, and user IDs. These LBs perform TLS termination and can handle additional responsibilities like rate limiting users, HTTP routing, and header rewriting.

Load balancer deployment

Load balancing is typically performed at multiple layers of the open systems interconnection (OSI) model, to balance the load across different paths and enable horizontal scalability. In a typical data center, load balancing is performed using a three-tier approach:

Tier-0 and Tier-1 LBs: The first tier of load balancing is handled by DNS, which is considered as the Tier-0 load balancer. The Tier-1 load balancers use equal cost multipath (ECMP) routers to divide incoming traffic based on IP or other algorithms like round-robin or weighted round-robin. The Tier-1 LBs then balance the load across different paths to higher tiers of load balancers.
ECMP routers are crucial for the horizontal scalability of higher-tier LBs.

Tier-2 LBs: The second tier of LBs includes Layer 4 load balancers. These LBs ensure that all incoming packets for any connection are forwarded to the same Tier-3 LBs. To achieve this goal, techniques like consistent hashing can be utilized. However, in case of any changes to the infrastructure, consistent hashing may not be enough. Therefore, local or global state must be maintained.
Tier-2 load balancers act as the glue between Tier-1 and Tier-3 LBs. Neglecting Tier-2 LBs can lead to erroneous forwarding decisions in case of failures or dynamic scaling of LBs.

Tier-3 LBs: The third tier of LBs provides services at Layer 7. These LBs are in direct contact with the back-end servers and perform health monitoring of servers at the HTTP level. This tier enables scalability by evenly distributing requests among the set of healthy back-end servers and provides high availability by monitoring the health of servers directly. This tier also reduces the burden on end-servers by handling low-level details like TCP-congestion control protocols, the discovery of Path MTU (maximum transmission unit), the difference in application protocol between client and back-end servers, and so on.
The idea behind the third tier is to leave the computation and data serving to the application servers and effectively utilize load balancing commodity machines for trivial tasks. In some cases, Layer 7 LBs are at the same level as the service hosts.

Implementation of load balancers

Load balancers can be implemented in various ways, depending on the application requirements, organization, and the number of incoming requests. The most common implementation types are:

Hardware load balancers: These were first introduced in the 1990s as standalone devices. Although they can handle a large number of concurrent users and have good performance, they are quite expensive and require additional human resources for configuration. Moreover, availability can be an issue with hardware load balancers because they need additional hardware for failover in case of failures. They also have high maintenance/operational costs, compatibility issues, and vendor locks.

Software load balancers: These are becoming increasingly popular because of their flexibility, cost-effectiveness, and programmability. They are implemented on commodity hardware and can scale well as requirements grow. Availability is not an issue because shadow load balancers can be implemented on commodity hardware at a small additional cost. They can also provide predictive analysis to help prepare for future traffic patterns.

Cloud load balancers: With the rise of cloud computing, Load Balancers as a Service (LBaaS) has been introduced, where cloud owners provide load balancing services. Users pay according to their usage or service-level agreement (SLA) with the cloud provider. Cloud-based LBs can perform global traffic management between different zones, although they may not replace a local on-premise load balancing facility. The primary benefits of cloud load balancers include ease of use, management, metered cost, flexibility, and auditing and monitoring services to improve business decisions.

💖 💪 🙅 🚩
modernsystemdesign
Nrapesh Khamesra

Posted on May 18, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related