Get startedGet started for free

Cluster scaling

1. Cluster scaling

Let’s explore cluster scaling in Google Kubernetes Engine. GKE can either be used in Standard mode or Autopilot mode. Unlike Autopilot mode, which automatically scales your cluster based on demand, Standard mode hands you the reins for manual scaling. As your applications' needs fluctuate, the resources they require will change too. The good news is that you can easily scale your cluster up or down to match those changing demands, right from the Google Cloud console or Cloud Shell. In GKE, a cluster contains one or more node pools. A node pool groups nodes that have the same configuration type within a cluster. The size of a node pool is set by specifying a minimum and maximum number of nodes, the maximum being 1,000. Node pools use a NodeConfig specification. Each node in the pool has a Kubernetes node label that has the node pool's name as its value. When a container cluster is created, the number and type of nodes specified becomes the default node pool. Then additional custom node pools of different sizes and types can be added to the cluster. And while individual node pools within a cluster can be scaled down to 0, the cluster itself can not entirely be shut down. This is because a cluster must have at least one node to run system Pods. When a cluster is scaled down, nodes are treated the same regardless of whether or not they are running Pods. As stated previously, a cluster can be resized manually or automatically. You can resize a cluster in Standard mode by manually entering the resize gcloud command, or by using the Google Cloud console. The resize command will remove instances at random, and running Pods will terminate gracefully. And what if you want to automatically scale clusters? GKE’s cluster autoscaler, which is a feature available on your Standard cluster node pools, automatically resizes a cluster based on the resource demands of your workload. The cluster autoscaler is disabled by default. When enabled, GKE will automatically add new nodes to a cluster whenever it detects that your Pods lack sufficient resources to operate as intended. GKE will also delete underutilized nodes if their Pod can run on other nodes. Now, let’s say there is a scenario where all of your node pools are low on resource capacity. To fix this, one or more Pods will need to be terminated or additional nodes need to be added. The Pod will enter a holding pattern as it await for additional resource capacity. During this period, the scheduler, which has the job of filtering any nodes that don't meet a Pod's specific scheduling needs, sets the schedulable Pod condition to false and marks it as Unschedulable. If a cluster doesn’t need to scale up, the cluster autoscaler will check for disposable nodes every 10 seconds. A node is considered disposable if all of the following conditions are true: Total CPU and memory is less than 50% of a node’s allocatable capacity. All pods running on the node can be moved to other nodes. The cluster does not have scale-down disabled. The cluster autoscaler will then continue to monitor, and if a node is unneeded for more than 10 minutes, it will be terminated. The cluster autoscaler can handle up to 15,000 nodes, with each node supporting a maximum of 256 pods. In version 1.31 and later, GKE supports large clusters up to 65,000 nodes. The 65,000 limit is meant to be used to run large-scale AI workloads. This is achieved by leveraging Spanner, Google’s distributed database that delivers virtually unlimited scale. Consider however, if using this number of nodes, that cluster autoscaler is not supported and you must scale these clusters manually using the GKE API. When using cluster autoscaler, there's a cluster-wide limit of 200,000 pods, regardless of your node setup. Be aware that standard Google Cloud limits for Compute Engine instances still apply. So if you haven't increased your default quota, new VMs won't start and disruptions may occur. Let’s wrap up by exploring some common gcloud commands used for autoscaling. You can add the --enable-autoscaling flag to a cluster’s create command to create a new cluster with autoscaling enabled. You can also add --enable-autoscaling flag to a node-pools create command to create a new node pool with autoscaling enabled. You can enable or disable autoscaling on existing node pools by adding an --enable-autoscaling, or --no-enable-autoscaling flag to a cluster update command. These actions can be executed from the Google Cloud console too. With zonal clusters, all resources (nodes and control plane) are created in the same zone by default. If secondary zones are enabled, all node pools will be duplicated in the secondary zone, similar to how pools are duplicated for regional clusters.

2. Let's practice!

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.