Performance and resource optimization

1. Performance and resource optimization

Welcome back. The application is observable and you can troubleshoot it; now we make it fast and cost-efficient. In this video, you'll learn to optimize performance and cost with concurrency, memory right-sizing, and caching. Let's get started.

2. Slow and expensive

Users say the app feels sluggish during busy periods, and the same month the Lambda bill jumps. The two are connected: cold starts add latency, and memory set by guesswork wastes money. Tuning by measurement makes the app both faster and cheaper, instead of trading one for the other.

3. Concurrency for cost vs cold start

We met concurrency, how many copies run at once, when configuring Lambda; now we use it to tune performance. Reserved concurrency caps how many instances a function uses, guaranteeing capacity and protecting everything else. Provisioned concurrency keeps instances pre-warmed, so they respond with no cold start, at an added cost. The decision follows traffic: a steady, user-facing function justifies provisioned concurrency, while a background job tolerates cold starts. You're trading cost against startup latency.

4. Right-sizing memory with the duration-cost curve

Right-sizing Lambda memory is a measurement, not a guess. Because memory also raises CPU, more memory shortens duration for compute-bound work. But cost is memory price times duration, so the two pull against each other. As you raise memory, duration drops fast and total cost falls, until extra memory no longer speeds things up and cost rises again. That low point is your sweet spot, often in the middle, not the smallest. So you profile at several sizes and pick the cheapest.

5. Application-level caching

The fastest backend call is the one you never make. The execution context, the function's leftover memory from the previous run while it stays warm, persists between invocations, so a global like a database client or hot lookup is reused free next call. That's why you initialize clients outside the handler. For a cache shared across instances, you move up to a managed store like Amazon ElastiCache, with sub-millisecond latency. Caching at the right layer cuts latency and backend load dramatically.

6. CloudFront caching at the edge

For widely requested content, you push caching to the edge with Amazon CloudFront. It caches responses at edge locations near users, so a repeat request is served without touching your backend, cutting latency and load. The cache key, which can include headers, query strings, or cookies, controls what counts as a unique response, so you cache per-language or per-version without over-caching. For static and cacheable content, this serves enormous traffic at almost no backend cost.

7. Choosing the right cache layer

Caching isn't one thing, it's three layers, and good systems use all three. Edge caching with CloudFront handles static, widely shared content closest to the user. Application caching keeps hot objects in the function's own memory for near-zero-cost reuse. Data caching with ElastiCache, or DAX, DynamoDB's in-memory accelerator, holds shared hot reads so your database isn't hammered. Knowing which layer a problem belongs to is what makes optimization deliberate instead of random.

8. Putting optimization together

Optimization is the same disciplined loop as troubleshooting, for speed and cost. You right-size memory using the duration-cost curve, so each function runs at its cheapest effective setting. You add provisioned concurrency only to latency-sensitive functions that justify it. You cache at whichever layer fits, edge, application, or data. And throughout, you measure, change one thing, and measure again. Done this way, performance work is predictable and your bill reflects deliberate choices.

9. Let's practice!

Let's practice what you learned in this video!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.