# 定义

  Apache Kafka® is a distributed streaming platform.



• Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
• Store streams of records in a fault-tolerant durable way.
• Process streams of records as they occur.

kafka定位

kafka几个重要的概念:

• Kafka is run as a cluster on one or more servers that can span multiple datacenters.
• The Kafka cluster stores streams of records in categories called topics.
• Each record consists of a key, a value, and a timestamp.

# 架构

kafka architecture

# topic

kafka topic

• topic定义
官方定义：A topic is a category or feed name to which records are published. Topics in Kafka are always multi-subscriber; that is, a topic can have zero, one, or many consumers that subscribe to the data written to it.
例如订单支付成功后，发送名为TOPIC_PAYMENT_ORDER_SUCCESS，积分系统可以接收这个topic，给用户送积分。会员系统可以接收这个topic，增加会员成长值。支付宝里的蚂蚁庄园还有支付成功后送饲料等。
• 磁盘&内存速度对比
由下图可知，顺序写入磁盘的速度（Sequential, disk）为53.2M，而随机写入内存的速度（Random, memory）为36.7M。

磁盘&内存速度对比

# durable

kafka对消息日志的存储策略为：The Kafka cluster durably persists all published records—whether or not they have been consumed—using a configurable retention period. For example, if the retention policy is set to two days, then for the two days after a record is published, it is available for consumption, after which it will be discarded to free up space. Kafka’s performance is effectively constant with respect to data size so storing data for a long time is not a problem.

# consumer

kafka消费topic中某个分区示意图如下，至于kafka如何在各个topic的各个分区中选择某个分区，后面的文章会提到。由下图可知，消费者通过offset定位并读取消息，且各个消费者持有的offset是自己的消费进度。

kafka consumer

# consumer group

• each record published to a topic is delivered to one consumer instance within each subscribing consumer group. Consumer instances can be in separate processes or on separate machines.
• If all the consumer instances have the same consumer group, then the records will effectively be load balanced over the consumer instances.
• If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.

consumer group

