Overview of Application Load Balancing

1. Overview of Application Load Balancing

Now let's talk about Application Load Balancing, which acts at Layer 7 of the OSI model. This is the application layer, which deals with the actual content of each message, allowing for routing decisions based on the URL. The Application Load Balancer distributes HTTP and HTTPS traffic to backends hosted on a variety of Google Cloud platforms, such as Compute Engine, Google Kubernetes Engine, Cloud Storage, and Cloud Run, as well as external backends connected over the internet or by using hybrid connectivity. Application Load Balancers are available in the following deployment modes: external and internal. You will learn about internal Application Load Balancers later in this module. External Application Load Balancers are implemented using Google Front Ends (GFEs) or managed proxies. Global external Application Load Balancers and classic Application Load Balancers use GFEs that are distributed globally, operating together by using Google's global network and control plane. GFEs offer multi-region load balancing in the Premium tier, directing traffic to the closest healthy backend that has capacity and terminating HTTP(S) traffic as close as possible to your users. Global external Application Load Balancers and regional external Application Load Balancers use the open source Envoy proxy software to enable advanced traffic management capabilities. These load balancers can be deployed in one of the following modes: global, regional, or classic. Let me walk through the architecture of an Application Load Balancer, by using this diagram. An external forwarding rule specifies an external IP address, port, and target HTTP(S) proxy. Clients use the IP address and port to connect to the load balancer. A target HTTP(S) proxy receives a request from the client. The HTTP(S) proxy evaluates the request by using the URL map to make traffic routing decisions. The proxy can also authenticate communications by using SSL certificates. A backend service distributes requests to healthy backends. The global external Application Load Balancers also support backend buckets. One or more backends must be connected to the backend service or backend bucket. The backend services contain a health check, session affinity, a timeout setting, and one or more backends. A health check polls instances attached to the backend service at configured intervals. Instances that pass the health check are allowed to receive new requests. Unhealthy instances are not sent requests until they are healthy again. Normally, Application Load Balancing uses a round-robin algorithm to distribute requests among available instances. This can be overridden with session affinity. Session affinity attempts to send all requests from the same client to the same virtual machine instance. Backend services also have a timeout setting, which is set to 30 seconds by default. This is the amount of time the backend service will wait on the backend before considering the request a failure. This is a fixed timeout, not an idle timeout. If you require longer-lived connections, set this value appropriately. The backends themselves contain an instance group, a balancing mode, and a capacity scaler. An instance group contains virtual machine instances. The instance group may be a managed instance group with or without autoscaling or an unmanaged instance group. A balancing mode tells the load balancing system how to determine when the backend is at full usage. If all the backends for the backend service in a region are at full usage, new requests are automatically routed to the nearest region that can still handle requests. The balancing mode can be based on CPU utilization or requests per second (RPS). A capacity setting is an additional control that interacts with the balancing mode setting. For example, if you normally want your instances to operate at a maximum of 80% CPU utilization, you would set your balancing mode to 80% CPU utilization, and your capacity to 100%. If you want to cut instance utilization in half, you could leave the balancing mode at 80% CPU utilization and set capacity to 50%. Now, any changes to your backend services are not instantaneous, so don't be surprised if it takes several minutes for your changes to propagate throughout the network.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.