Scaling and concurrency controls

1. Scaling and concurrency controls

In this video, you'll learn how Lambda scales with demand, what concurrency really means, and how Reserved and Provisioned Concurrency help you control load and cold starts.

2. Scaling in one picture

When demand increases, Lambda scales out by running more copies of your function in parallel, each in its own execution environment. Concurrency is simply how many are running at once.

3. What is concurrency?

Think of each invocation as one unit of work. Concurrency is how many invocations are running right now, so if 10 are running, your concurrency is 10.

4. Estimating required concurrency

A quick estimate is requests per second times average duration in seconds. Reducing duration lowers concurrency, and this helps you size limits safely before you tune anything.

5. Example: 50 rps at 200 ms

With 50 requests per second and 0.2 seconds per request, you need about 10 concurrent executions to keep up. Faster code means fewer parallel runs.

6. Limits: account pool vs function slice

Concurrency is limited at the account level. You can reserve part of that pool for a critical function, and other functions share what's left.

7. Reserved Concurrency: a hard cap

Reserved Concurrency is a safety valve. It limits parallel work, and above the cap invocations throttle, so a noisy function can't overload a database or exhaust account capacity.

8. Throttling: what it looks like

When you hit a concurrency limit, Lambda throttles. That means the caller doesn't get normal execution, and you should see throttles in monitoring.

9. Provisioned Concurrency: warm capacity

Provisioned Concurrency is about readiness. You pay to keep a pool of pre-initialized environments, often behind an alias, so requests start faster.

10. Cold start vs provisioned

Cold starts add a one-time initialization delay. Provisioned Concurrency performs that work in advance, so the first request doesn't pay the setup cost.

11. Reserved vs provisioned (different problems)

These settings solve different problems. Reserved controls load. Provisioned controls startup latency. Many production functions use both.

12. Protect downstream systems

If your database can handle only so many concurrent connections, a concurrency cap can prevent a traffic spike from turning into an outage.

13. Bursts vs steady traffic

A burst can create many parallel environments. If your handler is slow, concurrency stays high longer. Caps can smooth the spike, and improving duration is also a scaling strategy.

14. Failures and retries increase load

When something fails, retries can amplify traffic. Concurrency limits help you limit blast radius and prevent a failure loop from overwhelming your dependencies.

15. A simple tuning workflow

Start with measurement. If you see throttling or downstream pressure, cap concurrency. If cold starts hurt latency, add provisioned capacity.

16. Key takeaways

Use concurrency to think about scaling: estimate with `rps * duration`, watch for throttles when Reserved Concurrency caps load, and use Provisioned Concurrency to reduce cold-start latency for critical paths.

17. Let's practice!

Let's practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.