Container-native load balancing

1. Container-native load balancing

With the combination of the Application Load Balancer and the Ingress object, it's possible to encounter something called the double-hop dilemma. The double-hop dilemma describes when traffic makes an unnecessary second hop between VMs running containers in a GKE cluster. Let’s explore a scenario where traffic is distributed without a container-native load balancer. The responsibility of a regular Application Load Balancer is to distribute traffic to all nodes of an instance group, regardless of whether the traffic was intended for the Pods within that node. By default, a load balancer can route traffic to any node within an instance group. When the client sends traffic, it’s directed through the Network Load Balancer. The Network Load Balancer chooses a random node in the cluster and forwards the traffic to it. In this example, there are three possible Nodes to choose from. Node 1 is chosen. Next, to keep the Pod use as even as possible, the initial node will use kube-proxy to select a Pod at random to handle the incoming traffic. The selected Pod might be on this node or on another node in the cluster. For this example, let’s say that Node 1 chooses Pod 5, which isn’t on this node. This means that Node 1 will forward the traffic to Pod 5 on Node 3. Pod 5 then directs its responses back through Node 1, which is when the double-hop happens. Node 1 then forwards the traffic back to the Network Load Balancer, which sends it back to the client. This method has two levels of load balancing, one by the load balancer, and the other by kube-proxy. This results in multiple network hops. The response traffic also follows the same path. As the name double-hop dilemma indicates, this method is not optimal for load balancing. This process only keeps the Pod use even at the expense of increased latency and extra network traffic, which is not ideal. When using traditional Kubernetes networking, you’ll need to decide what’s more important: having the lowest possible latency or the most even cluster load balancing. If low latency is the most important, then the LoadBalancer Service can be configured to force the kube-proxy to choose a Pod local to the node that received the client traffic. To do this, set the externalTrafficPolicy field to “Local” in the Service manifest. This choice eliminates the double-hop to another node, as the kube-proxy will always choose a Pod on the receiving node. When packets are forwarded from node to node, the source client IP address is preserved and directly visible to the destination Pod. Although this preserves the source IP address, it introduces a risk of creating an imbalance in cluster load. Alternatively, if getting the most even cluster load balancing is more important, then the container-native load balancing configuration might be a better choice. With this option, the powerful Google Cloud Application Load Balancer is still used, however, the load balancer will direct the traffic to the Pods directly instead of to the nodes. This method requires GKE clusters to operate in VPC-native mode, and it relies on a data model called network endpoint groups. Network endpoint groups represent IP-to-port pairs, which means that Pods can simply be just another endpoint within the group, equal in standing to compute instance VMs. Every connection is made directly between the load balancer and the intended Pod. For example, traffic intended for Pod 3 will be routed directly from the load balancer to the IP address of Pod 3 using a network endpoint group. So, what’s the best choice? The answer to that question depends on the application. Both configurations can be profiled and the one that provides the best overall application performance can be selected. However, the “Local” external-traffic policy may cause other issues as it imposes constraints on the mechanisms that balance Pod traffic internally. It may also cause issues externally as the Application Load Balancer forwards traffic via nodes with no awareness of the state of the Pods themselves. Note, there are many benefits to using container-native load balancing and Network Endpoint Groups. Let’s explore a few. One benefit is that Pods can be specified directly as endpoints for Google Cloud load balancers. This means that the traffic will be directed to the intended Pod, eliminating extra network hops. Another benefit is that load balancer features, such as traffic shaping and advanced algorithms, are supported. The load balancer can accurately distribute the traffic, since it has a direct connection to the Pod. The next benefit is that container-native load balancing allows direct visibility to the Pods and more accurate health checks. Since the source IP is preserved, the roundtrip time it takes traffic to travel from the client to the load balancer can be measured, which can be useful when troubleshooting issues. This visibility can be easily extended using Google Cloud Observability tools. And then there is the fact that there are fewer network hops in the path, which optimizes the data path. This improves latency and throughput, providing a better network performance overall. And finally, it’s worth noting that there is support for Google Cloud networking services, like Google Cloud Armor, Cloud CDN, and Identity-Aware Proxy.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.