Apache Kafka
An event streaming platform written in Java.
Features:
- Persistent storage
- Strict message ordering
- Easy horizontal scaling
Learn
Concepts:
- A cluster consists of brokers that are separate servers with their own disks.
- A producer writes data into a cluster, a consumer then reads it by constantly polling the cluster.
- A topic is an ordered collection of events that consists of partitions, which may belong to different brokers. This is how topics scale.
- Each partition has its own offset that increases monotonically with each event. Consumers can read events at any offset.
- There is a special topic
__consumer_offsetsthat keeps track of each consumer's offset in each partition. - Each partition has a configurable number of replicas, one of which is the leader and the others are the followers.
- There are 3 producer acknowledgment types:
acks=0(fire-and-forget),acks=1(wait for the leader only), andacks=all(wait for all the followers). - Consumers can be grouped into a consumer group that acts as a single consumer. This is how consumers scale.
- Each event consists of a key, a value, a timestamp, and metadata headers.
- If an event has a key, the partition it is written to is selected as
hash(key) % number_of_partitions. - Otherwise, the partition is automatically selected using the round-robin strategy.
- A compacted topic stores only the most recent event for events with the same key.
- Consumer lag is the difference between the current offset in a partition and the last offset read by a consumer.
- A Dead Letter Queue (DLQ) is a special topic for events that consumers fail to process for whatever reason.
Kafka Connect
Connectors:
- Debezium: streams database changes, such as MySQL binlog.