September 09, 2017

Kafka & Zookeeper Offsets

Kafka version 0.9v and above provide the capability to store the topic offsets on the broker directly instead of relying on the Zookeeper.

The decision on whether to store the offset in Kafka or Zookeeper is dependendent on both the Kafka broker version and the version of the client driver. You can refer to the following table for the behavior

Kafka Version / Client Driver Version Less than 0.9 0.9 or above
Less than 0.9 Offset Storage : Zookeeper Offset Storage : Zookeeper
0.9 or above Offset Storage : Zookeeper Offset Storage : KAFKA


Offset Storage - Zookeeper

Data Model


Zookeeper data more is very similar to a file directory system. Each node can have none or many children. Unlike the file system , Zookeeper nodes can have data associated with the nodes. These data can range from configuration data, status details , timestamps etc which help Zookeeper do what it does best.
Each node is called a 'znode' in Zookeeper parlance.

Zookeeper storage data model
Zookeeper Storage Data Model

For the offsets, the data is stored under the following path :


Format: /consumers/{CONSUMER_GROUP_ID}/offsets/{TOPIC_NAME}/{PARTITION_NUMBER}

Example: /consumers/photo-events-processor/offsets/photos/20 The above path will contain the offset information for the 20th partition associated with
the 'photos' topic for the 'photos-events-processor' consumer group.

Retrieving Offsets From Zookeeper

Step#1 : Connect to Zookeeper Shell
The '.sh' script is located under the /bin folder of the Zookeeper installation.


bash$ ./zookeeper-shell.sh zookeeper1v:2181

Step#2 : Execute the following command to retrieve the offset metadata


get /consumers/photo-events-processor/offsets/photos/20


Output:
Zookeeper offsets


Offset Storage - Kafka

Offsets in Kafka are stored as messages in a separate topic named '__consumer_offsets' . Each consumer commits a message into the topic at periodic intervals. The message contains the metadata related to the current offset, the consumer group , partition number, topic associated with the offset and other useful information.

Reading Offsets From Kafka

Since __consumer_offsets is a just like any other topic, its possible to consume the message off. Before we do that we need make this topic visible to the consumers since this is an internal KAFKA topic and is not visible to the consumers by default. In order to make the topic visible, execute the following command.

bash$echo "exclude.internal.topics=false" > /tmp/consumer.config

The following command prints the offsets and other metadata on the console.

bash$./kafka-console-consumer.sh --consumer.config /tmp/consumer.config --formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter" --zookeeper : --topic __consumer_offsets

The kafka-console-consumer.sh script is located under the /bin directory of the kafka installation
Kafka offsets

Structure of message in __consumer_offset topic


{
    "topic":"topic-name",
    "partition":11,
    "group":"console-consumer-45567",
    "version":2,
    "offset":15,
    "metadata":"",
    "commitTimestamp":1501542796444,
    "expireTimestamp":1501629196444
}