Get startedGet started for free

Kafka architecture

1. Kafka architecture

Welcome back! Let's talk about the architecture of how everything in Kafka works together.

2. Kafka components

First, let's look at the major components of Kafka: the servers and the clients. In Kafka, a server is a cluster of one or more computers that handle storing data and manage communications with Kafka clients. It can also handle integration with other systems including databases, log files, and so on. We're already familiar with Kafka clients, which read data via Kafka consumers or write data via producers. Kafka clients can also process data locally and then store that information elsewhere or back to Kafka itself.

3. Kafka server

The primary task of a Kafka server is acting as a Kafka broker. The broker, like a stock broker, handles communication between consumers and producers. The server also handles data storage. Data written by producers is stored and organized via topics. We've already seen topics in the previous chapter and we'll cover working with them in more depth in a coming lesson. For now, know that topics are partitioned, meaning that they are stored in separate pieces. The individual messages are stored in a given partition based on an event id, such as a customerid or locationid. Each message will be retrieved in the order it is written to the topic.

4. Partitions & replication

Kafka is a fault-tolerant system, meaning that if a system within a cluster goes offline, the others can provide the data. For each Kafka cluster, there is a replication factor. The replication factor minus 1 equals the number of system failures the cluster can withstand without losing any data. The maximum replication factor is equal to the number of the servers in the cluster. The fault-tolerance is handled by replicating topic partitions within the cluster.

5. Example

Let's consider an example cluster with 3 brokers. This cluster has three topics defined and set with 2x replication, meaning there are two copies of each partition within the cluster. We can lose one system before data failure. Broker 1 has a copy of topic 1 and topic 2. Broker 2 has a copy of topic 2 and topic 3. Broker 3 has copies of topic 1 and topic 3. As such, each topic is shared across 2 brokers on the cluster providing the 2x replication. Note that the actual layout of the data will vary, but this illustrates a basic layout scenario.

6. Example with 1 failure

Consider what happens if Broker 2 failed. With a replication factor 2, 2 minus 1 means we can handle 1 failed system. In this case we're ok as each topic still exists in the cluster.

7. Example with 2 failures

Now, let's consider losing two brokers. With our 2x replication factor, we're above our max failures, which means we will lose data. Topic 2 is no longer available in the cluster, but note that topics 1 and 3 are still available, so even in this massively degraded state, some data can still be available.

8. Let's practice!

We've covered a lot of information about Kafka architecture - let's practice what you've learned in the exercises ahead.

Create Your Free Account

or

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.