AI and GKE overview

1. AI and GKE overview

Let's start by exploring the impact of GenAI as it relates to GKE at a high level. The ways in which organizations approach large language models is constantly changing. Model sizes are growing rapidly, resulting in a fundamental shift in model capabilities. The OSS software ecosystem is growing too. Organizations want to make the most of ML, with an open software ecosystem, providing easy access to open-source tools, open models, frameworks, and APIs. And the demand for hardware accelerators for AI/ML workloads has increased substantially. To meet the demands of this hyperscaling trend, organizations require a platform that can scale dynamically to very high limits. Large-scale model training and inference are expensive and ongoing, emphasizing the need for cost-efficient scaling methods. Additionally, a single platform that supports all workload types, including AI workloads, simplifies the management and optimization of an organization's infrastructure. Kubernetes is an excellent platform for AI workloads. It supports any workload type, including AI, driving robust and consistent development across the stack. GKE provides a single platform for all your workloads, ensuring a consistent and robust development process. As the foundation, it's scalable and compatible with a diverse set of hardware accelerators, including GPUs and Google proprietary TPUs. Accelerated orchestration helps improve performance and reduce costs. Google's open model for GenAI, Gemma, is available on GKE. Gemma is a family of lightweight, state of the art open models, built from the same research and technology that's used for Gemini models. Gemma on GKE empowers teams to deploy and easily serve open models. Kubernetes, and GKE by extension, has access to an ecosystem of first-party and third-party frameworks, such as TensorFlow, Ray, and PyTorch, making it easy to build or extend AI models. To optimize your infrastructure, assess each workload. Then design and optimize at a system level, across compute, storage, networking, hardware, and software to meet the unique needs of each workload. Factors to consider include performance, reliability, security, cost, and sustainability. Google Cloud offers a wide range of managed services that integrate with Vertex AI and ML workloads. Let's explore the benefits of using these managed services. Reduced operational overhead and complexity, assisting with elements like infrastructure management, scalability, availability, and maintenance. Increased development speed and productivity, providing simplified ML workflows, prebulit components, and tools. Cost effectiveness for a variety of use cases, pay as you go pricing, and reduced operational costs contribute to optimized resource utilization. And simplified security and compliance. Google provides built-in security and compliance certifications. But managed services are not always the most suitable approach. Use cases for unmanaged services include need for extreme customization and control, using custom algorithms and models, requiring unique preprocessing and post-processing, hardware and software dependencies. Need for fine-grained performance optimization. For applications requiring extremely low latency inference, specific resource management and cost optimization requirements. Data governance and compliance. Some industries, or regions, have strict regulations with data residency and sovereignty. Sensitive data may require customized security and access control. Auditability and explainability may not be available with managed services. And hybrid and multicloud strategies. Existing infrastructure may have large investments in existing infrastructure. The Google technology stack is fully integrated and open, enabling organizations to bring GenAI to real-world experiences quickly, efficiently, and responsibly. Let's break down the components of this technology stack. At the top of the infrastructure is Vertex AI, a platform that provides the tools and infrastructure to build, deploy, and manage your agents. Then there are Gemini models, the brains behind the agents. And finally, the foundation is the Hypercomputer, which provides the power and scalability needed to train and serve AI models. Let's explore the architecture of the Hypercomputer. At the foundation is a selection of purpose-built infrastructure offerings. For compute options, there are cloud CPUs, GPUs, and CPUs. For storage, options like GCS Fuse, Parallelstore, and Hyperdisk ML are available to address varying storage needs. And you have the core data center expertise that's used to serve AI experiences to billions of Google users globally. Because GPUs and GPU resources must be portable, containerization is an ideal solution. GKE can handle orchestration across thousands of GPUs. GPUs are in high demand and, therefore, expensive. With GKE, you get maximum performance for the cost. Next is software. APIs and other primitives help command the hardware, and GKE and partner offerings help schedule and orchestrate workloads. The next level consists of the frameworks, libraries, and tools where ML practitioners need the most flexibility. These elements may come from different projects. Flexible consumption options let you experiment while still being cost efficient. Latency tolerance or fixed duration workloads enable experimentation without runaway costs. When you put everything together, you get a collection of patterns and the confidence in knowing they will perform well over time, without sacrificing flexibility. Portability, efficient orchestration, maximum performance, and cost effectiveness are benefits of using GKE for AI model training. In addition to model training, there are several benefits of using GKE for model serving. Open models are driving down inference costs, and GKE tools and best practices increase AI engineering velocity. What's more, to scale and optimize your workloads, fine-grained control is necessary.

2. Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.

Manage Scalable Workloads in GKE

AdvancedSkill Level

4.5+

4 reviews

In this introduction, you'll explore the course goals and preview each section.

Exercise 1: Course introduction