kafka之Consumer Poll

Consumer poll 模型

Consumer poll 方法的真正实现是在 pollOnce() 方法中，一次 poll 过程,包括检查新的数据、做一些必要的 commit 以及 offset 重置操作。

/**
 * Do one round of polling. In addition to checking for new data, this does any needed offset commits
 * (if auto-commit is enabled), and offset resets (if an offset reset policy is defined).
 * @param timeout The maximum time to block in the underlying call to {@link ConsumerNetworkClient#poll(long)}.
 * @return The fetched records (may be empty)
 */
// note: 一次 poll 过程,包括检查新的数据、做一些必要的 commit 以及 offset  重置操作
private Map<TopicPartition, List<ConsumerRecord<K, V>>> pollOnce(long timeout) {
    // note： 1. 获取 GroupCoordinator 并连接、加入 Group、sync Group, 期间 group 会进行 rebalance 并获取
    coordinator.poll(time.milliseconds());
    // assignment

    // fetch positions if we have partitions we're subscribed to that we
    // don't know the offset for
    // note: 2. 更新要拉取 partition 的 offset（如果需要更新的话）
    if (!subscriptions.hasAllFetchPositions())
        updateFetchPositions(this.subscriptions.missingFetchPositions());

    // if data is available already, return it immediately
    // note: 3. 获取 fetcher 已经拉取到的数据
    Map<TopicPartition, List<ConsumerRecord<K, V>>> records = fetcher.fetchedRecords();
    if (!records.isEmpty())
    // 如果不是空，说明 completedFetches 队列中有数据，就直接拿数据
        return records;
    // 如果是空
    // note: 说明上次 fetch 到是的数据已经全部拉取了,需要再次发送 fetch 请求,从 broker 拉取数据

    // send any new fetches (won't resend pending fetches)
    // note: 4. 向订阅的所有 partition 发送 fetch 请求,会从多个 partition 拉取数据，结果存放在completedFetches中
    // 但是这里的发送并不是真正的发送，而是将 FetchRequest 请求对象存放在 unsend 缓存当中，然后会在 ConsumerNetworkClient#poll 方法调用时才会被真正地执行发送。
    fetcher.sendFetches();

    long now = time.milliseconds();
    long pollTimeout = Math.min(coordinator.timeToNextPoll(now), timeout);

    //note: 5. 调用 poll 方法发送数据
    client.poll(pollTimeout, now, new PollCondition() {
    // 发送请求到服务端，但是如果之前发送的fectch请求还在路上的话，就block等待。
        @Override
        public boolean shouldBlock() {
            // since a fetch might be completed by the background thread, we need this poll condition
            // to ensure that we do not block unnecessarily in poll()
            return !fetcher.hasCompletedFetches();
        }
    });

    // after the long poll, we should check whether the group needs to rebalance
    // prior to returning data so that the group can stabilize faster
    //note: 6. 如果 group 需要 rebalance, 直接返回空数据,这样更快地让 group 进行稳定状态
    if (coordinator.needRejoin())
        return Collections.emptyMap();

    // 再次返回数据
    return fetcher.fetchedRecords();
}

在这里，我们把一个 pollOnce 模型分为6个部分，这里简单介绍一下：

coordinator.poll()：连接 GroupCoordinator，并发送 join-group、sync-group 请求，加入 group 成功，并获取其分配的 tp 列表；，如果设置了自动 commit，也会在这一步进行 commit，总之，对于一个新建的 group，group 状态将会从 Empty –> PreparingRebalance –> AwaiSync –> Stable；
updateFetchPositions()：在上一步中已经获取到了这个 consumer 实例要订阅的 topic-partition list，这一步更新其 fetch-position offset，以便进行拉取；
fetcher.fetchedRecords()：返回其 fetched records，并更新其 fetch-position offset，只有在 offset-commit 时（自动 commit 时，是在第一步实现的），才会更新其 committed offset；
fetcher.sendFetches()：只要订阅的 topic-partition list 没有未处理的 fetch 请求，就发送对这个 topic-partition 的 fetch 请求，在真正发送时，还是会按 node 级别去发送，leader 是同一个 node 的 topic-partition 会合成一个请求去发送；
client.poll()：调用底层 NetworkClient 提供的接口去发送相应的请求；
oordinator.needRejoin()：如果当前实例分配的 topic-partition 列表发送了变化，那么这个 consumer group 就需要进行 rebalance。

注意：

其实 Kafka 消费者在拉取消息过程中，有两条线程在工作，其中用户主线程调用 pollForFetches 方法从缓存中获取消息消费，在获取消息后，会再调用 ConsumerNetworkClient#poll 方法从 Broker 发送拉取请求，然后将拉取到的消息缓存到本地，这里为什么在拉取完消息后，会主动调用 ConsumerNetworkClient#poll 方法呢？我想这里的目的是为了下次 poll 的时候可以立即从缓存中拉取消息。

当缓存中还存在中还存在某个分区的消息数据时，消费者不会继续对该分区进行拉取请求，直到该分区的本地缓存被消费完，才会继续发送拉取请求。假设某消费者监听三个分区，每个分区每次从 Broker 中拉取 4 条消息，用户每次从本地缓存中获取 2 条消息：