Load balancers are infrastructure components that distribute incoming network traffic between multiple backend servers. They increase capacity and add redundancy by keeping services available if one of your servers fails.
Load balancers act as a public gateway to your application. They are specialized in their roles so that they can be highly optimized to increase traffic throughput. Load balancers can typically be configured with several types of routing algorithms to suit your application’s requirements.
In this article, we’ll explore what load balancers are, how they work, and some of the complications they can cause. We will also explain the differences between the most common load balancing algorithms.
What load balancers do
Load balancers are responsible for providing a reverse proxy to your application’s servers. All clients connect to this single proxy instead of individual backend instances. A load balancer is responsible for selecting a server to handle each request. This happens invisibly to the external client.
Both hardware-based and software-based load balancer implementations are available. On the software side, most web servers such as Apache and NGINX are capable of performing this role. Hardware load balancers are deployed as infrastructure components independent of your hosting provider.
Load balancers typically monitor the health of the instances in their server pool. Backends that become unhealthy stop sending new traffic, reducing service drain and downtime. Similarly, load balancers generally allow you to add new backend instances at any time, so you can expand your service with additional capacity during peak hours.
The main purpose of a load balancer is to increase productivity and make the most efficient use of available resources. Being able to scale horizontally across physical servers is usually more efficient than vertically scaling a single node with additional CPU or memory. Horizontal scaling gives you more redundancy and capacity, while the overhead incurred by the load balancer layer is generally nominal.
Load balancing algorithms
While the goal of load balancing is always to distribute traffic across multiple servers, there are several ways to achieve this. Before looking at specific strategies, it’s important to identify the two main types of algorithms you can choose from:
- Static balancing – These methods work from hard-coded configuration values, making them completely predictable in their operation. Such an algorithm does not take into account the state of the backend servers it can forward to, so it can continue to send new requests to an already overwhelmed instance.
- Dynamic balancing – Dynamic algorithms adjust themselves in real-time based on traffic flow and availability of servers in your pool. These strategies are already able to automatically avoid instances that handle multiple requests. Because the load balancer must track the completion status of each request, dynamic load balancing can add a small amount of overhead.
Static balancing systems are usually easier to configure, test and verify. Dynamic balancing is more powerful and is usually the preferred option for production applications. Each of these classes has several specific routing strategies you can choose from:
- Circular system – Round robin is a static balancing method that routes requests to each server in turn. If you have three servers A, B, and C, the first incoming request will go to A, the second to B, and the third to C. The load balancer will restart on A for the fourth request.
- Weighted circular system – A change to the round robin algorithm where admins determine the relative priorities of each server in the pool. A heavyweight server will be used more often, receiving a higher share of traffic. This method allows you to use a round robin strategy when your server pool consists of servers with unequal specifications.
- Casual – Many load balancers include a true random option as an alternative static option.
- Hashed – This static balancing strategy hashes the client’s IP address to determine which servers to handle the request. This ensures that the same instance serves every connection from that client.
- Minimal contact – This is a popular dynamic algorithm that always forwards incoming requests to the server with the fewest open connections. In many applications, this is the most effective way to increase overall performance.
- Highest bandwidth availability – This method sends new traffic to the server with the most available bandwidth. This is ideal in situations where individual requests may use large amounts of bandwidth, even if the total request count remains low.
- Custom health/boot endpoint – Many load balancers include a way to make traffic distribution decisions based on specific metrics exposed by your backend servers. Using a mechanism such as SNMP, queries can be made against CPU usage, memory consumption, and other critical metrics.
Other Load Balancer Features
Load balancers can create several complications for your application. One of the most common is the problem of achieving sticky back sessions. It is common for systems to store state on the server, and it should be maintained between client connections.
You can mitigate this by using a hashed balancing algorithm or similar client-based option. This ensures that connections from the same IP address are terminated on a specific server. Most load balancers also offer the option of open sticky sessions, which look for a header or cookie set in an HTTP request. This value can be used to forward requests to the same server sequentially after the initial connection of the client.
Load balancers can also create complexity around SSL. Many organizations configure SSL to terminate on the load balancer. Connections between the load balancer and your backend servers are made over plain HTTP. This typically results in a simpler installation experience with reduced maintenance requirements.
Using only HTTP connections in the forward direction is not always acceptable for security-critical workloads. Load balancers capable of SSL traversal can deliver traffic directly to your back-end servers without first decrypting the data. However, this limits the routing functionality you can use: since the load balancer can’t decipher incoming requests, you won’t be able to perform matching based on attributes like headers and cookies.
Layer 4 and Layer 7 Load Balancers
Load balancing is often discussed in the context of Layer 4 (L4) and Layer 7 (L7) networking. These terms describe the point at which a load balancer forwards traffic during the lifecycle of a network request.
Layer 4 resources operate at the network transport layer. These load balancers make routing decisions based on the characteristics of the request’s transport, such as the TCP or UDP port used. Request information is ignored.
The Layer 7 load balancer is adjacent to the application layer. These load balancers can access complex data within a request and use it to inform workload-specific routing rules. This is where load balancing, which computes the session ID in the HTTP header or cookie, can occur.
Layer 7 load balancing is powerful but relatively resource intensive. It must parse and validate the content of each request before passing it to the backend. The packet-based nature of Layer 4 load balancers provides less control, but has a correspondingly reduced impact on throughput. Layer 4 also does not decrypt the traffic, so at this stage the load balancer will not expose the compromise request data.
Load balancers allow you to route incoming traffic between your servers. They are a critical component of a highly available network architecture that enables transparent management of large numbers of backend instances. This increases serviceability and prevents total outage if one server goes offline.
Most load balancer applications give you several different algorithm choices, including both static and dynamic options. Many applications are well served by simple options such as “least contact” or “round robin”, but more complex options are useful in special cases.
It is a good practice to run each production application behind a load balancer. This gives you the flexibility to scale on demand and react to unhealthy servers. Load balancing is usually easy to set up in your hosting stack or your cloud provider’s network infrastructure.