Event source mappings and stream processing

1. Event source mappings and stream processing

In this video, you'll see how Lambda connects to queues and streams with event source mappings. We'll cover polling, batching, and partial batch failures.

2. Where event source mappings fit

For queues and streams, Lambda often polls for records and delivers them in batches. The event source mapping is the component that controls this polling and batching behavior.

3. What is an event source mapping?

An event source mapping connects a source to your function. It polls a queue or stream and invokes your handler with a batch of Records. Think of it like a mail carrier delivering a bundle of letters.

4. Push vs poll (why it matters)

Push sources send events immediately, like an upload to Amazon S3. Poll sources are checked by Lambda, which builds a batch and then invokes your handler. This changes latency and retry behavior, so it matters which model your source uses.

5. Queue vs stream

Amazon SQS is a managed message queue, like an inbox of messages. DynamoDB Streams is a change log for a DynamoDB table. Both arrive as Records, but the record meaning is different. In our hands-on labs here, you will mostly reason about SQS queue behavior and backlog.

6. Batch event shape (simplified)

Most mapping events start with Records. For SQS, each record has a messageId and a body string. You usually parse body into your own payload.

7. Walkthrough: process each record

Read Records with a default list, loop through each record, read body safely, and log what you need. Then return a response.

8. Walkthrough: parse JSON body

Import json, read the body with a safe default, parse with json.loads, and extract the fields you need. Always validate before processing.

9. Batching: batch size and batch window

Batch size is how many records arrive per invocation. Batch window is how long Lambda waits to fill a batch. Bigger batches improve throughput but add per-run work. For SQS, queue age metrics like ApproximateAgeOfOldestMessage help you see when messages are waiting too long.

10. Batch size trade-offs

Large batches are efficient but can increase latency. Small batches reduce time-to-first-processing but increase invocations. Choose based on your workload.

11. Scaling with concurrency

Concurrency means multiple batches are processed at the same time. Higher concurrency increases throughput but also increases downstream load.

12. Protect downstream systems

If your handler writes to a database or calls an API, too much concurrency can overwhelm it. Use limits and safe units of work to control load.

13. Partial batch failures

A batch can contain good and bad records. Without partial failures, one bad record can cause the whole batch to retry. Partial batch failure handling lets you retry only the items that failed.

14. Walkthrough: reporting partial failures (SQS)

Collect messageIds for failed records and return them as batchItemFailures. Lambda retries only those items instead of the whole batch.

15. Key takeaways

Mappings poll and batch Records from queues and streams. You tune batch size and window for latency versus throughput. Partial failures prevent reprocessing successes.

16. Let's practice!

Let's jump in and practice with some exercises!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.