Log Segment Storage on Apache Kafka® Brokers

Introduction

Efficient storage and retrieval of messages are the bedrock of Apache Kafka architecture. It is important to understand where and how Kafka stores messages and their metadata on the brokers' filesystem. This article gives an overview of the directory structure and various files Kafka uses to persist your log messages and fetch them quickly when needed.

Directory Structure

Log segments are stored in dedicated directories on the broker file system. You can configure one or more directories for your data in the log.dirs broker config. Multiple directories are separated by a comma. If you have multiple log directories, Kafka distributes topic-partitions across these directories (round-robin by default).

log.dirs=/var/data1,/var/data2

There is also a deprecated log.dir config property that allowed just one directory to be specified.

Each topic partition has its own directory in one of the directories specified in the log.dirs config property. The log segments for that topic-partition and the associated metadata are stored in that dedicated folder. In the picture below you see a topic named 'orders' which has two partitions. Each topic-partition folder shows the log segment file (*.log) and some of the common metadata files and their sizes in bytes.

Log Segment Files

The log segment files (xxxx.log) contain the actual messages for your topic partitions. In the example above there was just one log segment file per topic partition, but in practice you would see multiple log segments and metadata files. A new log segment file is created when it reaches a certain size or age as specified in the following broker config properties.

log.segment.bytes=5242880          # Default is 1073741824 (one GB)
log.roll.hours=24                  # Default is 168 (seven days)
log.roll.ms=3600000                # Default is null

Notice that log.roll.hours is only used if log.roll.ms is not set.

In the picture below you see three sets of log segments and their associated metadata files, each set is colored using a different color. The number in the .log file name represents the base offset of the first message in that log segment. In the example below, the first offset of the second log file is 54629. Similarly, the first offset of the third log file is 1291369.

The Kafka log files are binary files that are not human-readable. However, you can inspect the contents of the log files using the kafka-dump-log.sh command.

bin/kafka-dump-log.sh --files 00000000000000000000.log

There are several options you can specify that will affect how the log segment file contents are displayed.

Index Files

The index files are used to quickly find messages in log segments based on their offset. Without the index file the broker would have to scan the entire log segment file to find the requested message.

Each entry in the index file has two 4-byte components, the offset and the position. The offset if not the absolute offset of the message, but rather the offset within the segment. To get the absolute offset, one must add the base offset of the segment (indicated in the file name) to the relative offset in the index entry. The position component stores the byte offset from the start of the log segment file.

Since the entries are fixed size and the offset are ordered in the index file, Kafka can use binary search to find the closest offset to the requested one.

As you can see from the image, not every offset is indexed. This is to keep the index file small, in order to save memory and disk space. The index.interval.bytes config property determines how ofen entries are added to the index. The value refers to the number of bytes written to the log segment, i.e. after every X bytes is written to the log segment, a new index entry is added.

index.interval.bytes=4096

Timeindex Files

Similarly to the regular index file, the timeindex file contains mappings from timestamps (of message creation time) to relative message offsets. Each entry is 12 bytes long, 8 bytes for the timestamp and 4 bytes for the relative offset.

Since the time index entries are fixed size and the timestamps are ordered in the index file, Kafka can use binary search to find the closest timestamp to the requested one.

Having the timeindex file allows Kafka, among other things, to support the KafkaConsumer.offsetsForTimes method in an efficient manner in the Kafka client library.

public Map<TopicPartition, OffsetAndTimestamp> offsetsForTimes(Map<TopicPartition, Long> timestampsToSearch)

Just like the index file, not every timestamp is indexed. The same index.interval.bytes config property is used to determine how ofen entries are added to the time index.