Compute Engine

1. Compute Engine

As mentioned in the introduction to the course, there is a spectrum of different options in Google Cloud for compute and processing. We will focus on the traditional virtual machine instances. Now the difference is, Compute Engine gives you the utmost flexibility: run whatever language you want - it's your virtual machine. This is purely an infrastructure as a service, or IaaS model. You have a VM and an operating system, and you can choose how to manage it and how to handle aspects, such as autoscaling, where you'll configure the rules about adding more virtual machines in specific situations. Autoscaling will be covered later in the course. The primary work case of Compute Engine is any generic workload, especially an enterprise application that was designed to run on a server infrastructure. This makes Compute Engine very portable and easy to run in the cloud. Other services, like Google Kubernetes Engine, which consists of containerized workloads, may not be as easily transferable as what you're used to from on-premises. So what is Compute Engine? At its heart, it's physical servers that you're used to, running inside the Google Cloud environment, with a number of different configurations. Both predefined and custom machine types allow you to choose how much memory and how much CPU you want. You choose the type of disk you want, whether you want to use persistent disks backed up by standard hard drives or solid-state drives, local SSDs, Cloud Storage, or a mix. You can even configure the networking interfaces and run a combination of Linux and Windows machines. We will discuss these options in more detail later in the module. Several different features will be covered throughout this module, such as machine rightsizing, startup and shutdown scripts, metadata, availability policies, OS patch management, and pricing and usage discounts. It is important to mention that hardware manufacturers have run up against limitations, and CPUs, which are central processing units, and GPUs, which are graphics processing units, can no longer scale to adequately reach the rapid demand for ML. To help overcome this challenge, in 2016 Google introduced the Tensor Processing Unit, or TPU. TPUs are Google's custom-developed application-specific integrated circuits used to accelerate machine learning workloads. TPUs act as domain-specific hardware, as opposed to general-purpose hardware with CPUs and GPUs. This allows for higher efficiency by tailoring architecture to meet the computation needs in a domain, such as the matrix multiplication in machine learning. TPUs are generally faster than current GPUs and CPUs for AI applications and machine learning. They are also significantly more energy-efficient. Cloud TPUs have been integrated across Google products, making this state-of-the-art hardware and supercomputing technology available to Google Cloud customers. TPUs are mostly recommended for models that train for long durations and for large models with large effective batch sizes. Refer to the available online documentation for more details. Let's start by looking at the compute options. Compute Engine provides several different machine types that we'll discuss later in this module. If those machines don't meet your needs, you can also customize your own machine. Your choice of CPU will affect your network throughput. Specifically, your network will scale at 2 gigabits per second for each CPU core, except for instances with 2 or 4 CPUs, which receive up to 10 gigabits per second of bandwidth. There is a theoretical maximum throughput of 200 gigabits per second for an instance with 176 vCPU, when you choose an C3 machine series. When you're migrating from an on-premises setup, you're used to physical cores, which have hyperthreading. On Compute Engine, each virtual CPU (or vCPU) is implemented as a single hardware hyper-thread on one of the available CPU platforms. For an up-to-date list of all the available CPU platforms, refer to the CPU platforms documentation link in the course resources. After you pick your compute options, you want to choose your disk. You have three options: Standard, SSD, or local SSD. So basically, do you want the standard spinning hard disk drives, or flash memory solid-state drives? Both of these options provide the same amount of capacity in terms of disk size when choosing a persistent disk. Therefore, the question really is about performance versus cost, because there's a different pricing structure. Basically, SSDs are designed to give you a higher number of IOPS per dollar versus standard disks, which will give you a higher amount of capacity for your dollar. Local SSDs have higher throughput and lower latency than SSD persistent disks, because they are attached to the physical hardware. However, the data that you store on local SSDs persists only until you stop or delete the instance. Typically, a local SSD is used as a swap disk, just like you would do if you want to create a ramdisk, but if you need more capacity, you can store those on a local SSD. Standard and non-local SSD disks can be sized up to 257 TB for each instance. The performance of these disks scales with each GB of space allocated. As for networking, we have already seen networking features applied to Compute Engine in the previous module's lab. We looked at the different types of networks and created firewall rules using IP addresses and network tags. You'll also notice that you can do Application Load Balancing and Network Load Balancing. This doesn't require any pre-warming because a load balancer isn't a hardware device that needs to analyze your traffic. A load balancer is essentially a set of traffic engineering rules that are coming into the Google network, and VPC is applying your rules destined to your IP address subnet range.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.