Event capture, scaling, and partitions

1. Event capture, scaling, and partitions

In this video, you’ll learn advanced features of Event Hubs.

2. What you will learn

You'll learn how to archive streams with Capture, how to scale with throughput units and partitions (including auto-inflate), and how consumer groups, checkpointing, and geo-disaster recovery keep your pipelines resilient.

3. Event capture

Let's start with Capture, which is a crucial component if we want to examine event data coming in. Event Hubs Capture automatically writes your incoming stream to Azure Blob Storage or Azure Data Lake Storage on a rolling schedule. It is responsible for writing event data into files, such as AVRO files. Data lands in storage in time- or size-based batches, so you have a durable record for reprocessing, compliance, or offline analytics. Your real-time consumers keep reading from Event Hubs as usual, while Capture creates a parallel cold path that’s inexpensive to store and easy to query later.

4. Scaling Event Hubs

Event Hubs throughput comes from two levers: throughput units and partitions. Throughput units control the overall throughput capacity of the Event Hub namespace. Think of it as the overall diameter of the pipe. Auto-inflate feature can adjust this capacity dynamically, saving costs when high throughput isn't needed. Partitions control parallelism and ordering scope: more partitions mean more concurrent consumers, but ordering is guaranteed only within a single partition key. This is configured at an individual Event Hub level and is equivalent to the number of pipes running in parallel.

5. Consumer groups

On the consumption side, consumer groups give each downstream application an independent view of the stream. Analytics, fraud detection, and archiving can all read the same events without stepping on each other’s checkpoints. Speaking of which, checkpointing records how far a consumer has read. If a worker restarts, it resumes from the last checkpoint instead of replaying everything. This is extremely useful if there's a network outage while a long stream of events is being read. With coordinated processors (like the Event Processor client), checkpoints are written periodically. Frequent checkpoints reduce rework. Less frequent ones reduce storage chatter.

6. Geo-disaster recovery

Finally, you can plan for failure with geo-disaster recovery. Event Hubs lets you pair namespaces in different regions under a single alias. Your producers and consumers talk to the alias; if the primary region goes down, you fail over the alias to the secondary. This would protect you even if a whole regional data center goes down! Geo-DR is a metadata-level failover for business continuity, which means that historical data isn’t replicated between regions. For long-term retention and cross-region analytics, you can use Event Hubs Capture to write data into geo-redundant storage and/or replicate outputs downstream.

7. Putting it all together

Putting it all together, we have Capture for a durable history, throughput units and partitions (plus auto-inflate) for elastic scale, consumer groups and checkpointing for clean, independent readers, and geo-DR to keep the lights on when a region blinks.

8. Let's practice!

Let's practice this ourselves!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.