1. Creating and managing Kafka clusters
Welcome back! Let's now look at how to create and manage Kafka clusters using Apache ZooKeeper.
2. What is ZooKeeper?
Our first question is, what is ZooKeeper? ZooKeeper is a software framework for managing information and providing services necessary for running distributed systems. We'll discuss some of those tasks more in a moment.
ZooKeeper is primarily used by developers to create distributed applications, but users do interact with it to manage those applications.
A few examples of applications using ZooKeeper include Kafka, the distributed processing framework Hadoop, and the graph database Neo4j.
3. What does ZooKeeper do?
Let's talk for a moment about what ZooKeeper actually does. It provides the various services necessary to run distributed applications.
This includes handling any configuration information and the naming of systems to prevent conflicts. It also provides the ability to synchronize across systems, such as determining what systems are available, when they should start, how services can reach them, and so forth. ZooKeeper also provides any other basic service that would be required for a group of systems to communicate.
It's important to note that ZooKeeper is designed as a framework so that each individual distributed application is not required to implement a custom version of these services. As an example, think about how a power plug or water hose nozzle implements common standards vs each having its own. This allows for easier configuration, implementation, and interaction.
4. ZooKeeper and Kafka
As this is a Kafka course, we're not going to dive too deeply into exactly how ZooKeeper works, but rather what we need to know about ZooKeeper to use Kafka. Kafka primarily uses ZooKeeper for its cluster management. There is a newer cluster management tool available called KRaft, but we're not going to cover it in this course.
Kafka uses two primary configuration files for server / cluster setup - the config/zookeeper.properties and config/server.properties file. You can see a portion of a zookeeper.properties file to the right. The zookeeper.properties file handles information needed for a basic zookeeper setup including where to store ZooKeeper data and what network port to run on.
5. config/server.properties
The primary Kafka server configuration file is the config/server.properties file. This contains considerably more information than the zookeeper.properties file and defines information specific to Kafka installations.
This includes information like details about Kafka brokers, any network configuration, where to store events, and basic topic configuration including any replication details.
6. Starting a Kafka cluster
Kafka servers are started in two steps. The first is using the command bin/zookeeper-server-start.sh config/zookeeper.properties.
This starts up the basic zookeeper server and the configuration details found in the zookeeper.properties file. There's an extensive amount of output that will vary based on your system.
7. Starting a Kafka cluster (continued)
The second part is the bin/kafka-server-start.sh config/server.properties command, which will start the actual Kafka services as defined in the server.properties file.
8. Stopping a Kafka cluster
To stop a Kafka cluster, we do the reverse using kafka-server-stop.sh and zookeeper-server-stop.sh. Note that because Kafka must cleanly shutdown any open connections and because it uses ZooKeeper services, the shutdown order is reversed.
9. Let's practice!
We've covered many details about ZooKeeper and Kafka cluster management. Let's practice what we've learned in the exercises ahead.