Streaming data with DynamoDB

1. Streaming data with DynamoDB

By the end of this video, you will understand DynamoDB Streams and how to integrate them with Lambda and Kinesis for real-time event processing.

2. DynamoDB Streams fundamentals

DynamoDB Streams enables applications to react to database changes in near real time using event-driven patterns. Whenever an item is inserted, modified, or deleted, DynamoDB Streams generates a stream record describing the change. Streams are used to build reactive applications that respond automatically to database updates without polling the table continuously. Streams are disabled by default and must be enabled per table using the `StreamSpecification` parameter. Each stream has its own ARN, distinct from the table ARN.

3. Retention period

Stream records are retained for 24 hours. This retention is fixed and cannot be configured. To change the stream view type you need to recreate the stream which generates a new stream ARN.

4. Streams and DynamoDB read capacity units

Stream reads do not consume the DynamoDB table's read capacity units (RCUs), so enabling streams does not affect provisioned throughput on the source table.

5. Stream records

Each stream record contains metadata describing the database change. DynamoDB Streams captures three event types: `INSERT`, `MODIFY` and `REMOVE`

6. Stream views

When enabling streams, developers choose a stream view type that controls which data is included in stream records. Available stream views are: `KEYS_ONLY` : only the partition and sort key. `NEW_IMAGE` : full item after the change. `OLD_IMAGE` : full item before the change. `NEW_AND_OLD_IMAGES` : full item before and after. `NEW_AND_OLD_IMAGES` is commonly used for auditing and change comparison because it includes both versions of the item. It is also required for DynamoDB Global Tables.

7. Ordering

DynamoDB Streams preserves ordering for records associated with the same partition key. Lambda processes records in batches and checkpoints progress automatically after successful processing. If processing fails, batches are retried until records expire or processing succeeds.

8. Streams and AWS Lambda

AWS Lambda is a common consumer for DynamoDB Streams. Lambda uses a polling-based event source mapping to read from the stream and invoke a function with batches of records.

9. Architectural pattern

A common architecture pattern looks like DynamoDB update triggers DynamoDB Stream triggers Lambda for downstream processing.

10. Scaling

DynamoDB Streams is divided internally into shards similar to Kinesis Data Streams. DynamoDB Streams allows up to two simultaneous consumers per shard. Exceeding this limit is a common cause of read throttling.

11. Scaling with Lambda

Lambda concurrency scales based on shard count. Developers can tune processing behavior using: `BatchSize` : maximum records per Lambda invocation `MaximumBatchingWindow` : wait time before invoking with partial batch `MaximumRetryAttempts` : retries before sending batch to failure destination `MaximumRecordAgeInSeconds` discard records older than this threshold `ParallelizationFactor` increases concurrent processing per shard while still preserving ordering for records with the same partition key. The maximum value is 10.

12. Filtering and tumbling windows

Lambda also supports filter criteria to discard records before function invocation (reducing cost), and tumbling windows for aggregating state across batches in the same shard.

13. Managing duplicate records

DynamoDB Streams captures each change exactly once. Lambda processes stream records with at-least-once semantics. Retries can cause the same record to be processed more than once. Consumers should therefore be designed to behave idempotently. A common pattern is storing processed event IDs or sequence numbers to avoid duplicate updates. Both `eventID` and `SequenceNumber` remain stable across retries, making them reliable idempotency keys.

14. Handling failures

`ReportBatchItemFailures` allows successfully processed records to avoid unnecessary retries when only part of a batch fails. Lambda also supports batch bisection using `BisectBatchOnFunctionError`, retry controls, and failure destinations (SQS or SNS) for poison batches that exhaust retries.

15. DynamoDB to Kinesis integration

DynamoDB tables can publish item-level changes directly to Amazon Kinesis Data Streams using the Kinesis Data Streams for DynamoDB feature. This is a separate, parallel capability rather than a chained one. Events do not flow through DynamoDB Streams to reach Kinesis. Both can be enabled on the same table simultaneously and operate independently. Kinesis Data Streams is useful when database events must feed broader streaming architectures with multiple independent consumers, longer retention, replay capability, or analytics pipelines.

16. Monitoring: common issues

Common operational issues include hot partitions, failed retries, duplicate processing, throttling and consumer lag.

17. Monitoring: CloudWatch metrics

When monitoring DynamoDB Streams using CloudWatch, important metrics include: Lambda Errors: Unhandled exceptions thrown by the Lambda function help track code or downstream issues. IteratorAge: High iterator age commonly indicates consumers are processing records slower than they are arriving. Throttling metrics: Lambda or DynamoDB stream read limits exceeded. Batch processing failures: Records sent to DLQ after exhausting retries.

18. Security

Access to DynamoDB Streams is controlled using IAM permissions. Streams also inherit DynamoDB encryption settings, including server-side encryption using AWS KMS. Applications should follow least-privilege IAM policies for stream consumers and downstream processors.

19. Let's practice!

Let's practice your knowledge of streaming data with DynamoDB streams.

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.