Redis源码 - High availability - Sentinel

前序文章讨论了Redis 从事件管理->命令处理->持久化->主从复制的实现

Redis HA的机制是如何实现呢?

本文主要想聊聊non-clustered(Redis Cluster) 模式下的高可用机制 Sentinel

  • Sentinel提供哪些功能?

  • Sentinel如果与master/slave 通信?

  • Sentinel 如何实现故障的发现和确认?

  • Sentinel如何leader election?

  • Sentinel如何完成failover?

    • 多个slaves,需要选择哪一个slave?
    • 切换步骤是什么?

Sentinel 功能 - big picture

  • Monitoring - 检查 redis 节点健康状况
  • Notification - Sentinel 可以通过 API 通知系统,受监控节点出现了问题
  • Automatic failover - 如果master未按预期工作,Sentinel 可以启动故障转移流程:将一个slave提升为主服务器,将其他额外的slave重新配置为使用新的master,并通知使用 Redis 服务器的应用程序连接时要使用的新地址
  • Configuration provider - Sentinel 充当client服务发现的源:client连接到 Sentinel,请求给定服务的当前 Redis master的地址。如果发生故障转移,Sentinel 将报告新的地址

Sentinel 通信

Sentinel启动

命令启动

  • redis-sentinel /path/to/your/sentinel.conf
  • redis-server /path/to/your/sentinel.conf --sentinel

代码略,生成sentinelHandleConfiguration->createSentinelRedisInstance->createInstanceLink

  • RedisInstance
  • InstanceLink

连接

  • sentinel节点连接master/slave
  • sentinel节点间也是互相连接的
  • sentinel是一种特殊的server mode, 亦可以处理client过来了连接

建立连接

serverCron 定时调用sentinelTimer,最终调用sentinelReconnectInstance建立Async连接,独立与主线程

sentinelTimer->
    sentinelHandleDictOfRedisInstances->
            sentinelHandleRedisInstance->
                    sentinelReconnectInstance
struct instanceLink {
    int refcount;          /* Number of sentinelRedisInstance owners. */
    int disconnected;      /* Non-zero if we need to reconnect cc or pc. */
    int pending_commands;  /* Number of commands sent waiting for a reply. */
    redisAsyncContext *cc; /* Hiredis context for commands. */
    redisAsyncContext *pc; /* Hiredis context for Pub / Sub. */
    ......
}
/* Create the async connections for the instance link if the link
 * is disconnected. Note that link->disconnected is true even if just
 * one of the two links (commands and pub/sub) is missing. */
void sentinelReconnectInstance(sentinelRedisInstance *ri) {
    ......
/* Commands connection. */
    if (link->cc == NULL) {
        link->cc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
        if (link->cc->err) {
            sentinelEvent(LL_DEBUG,"-cmd-link-reconnection",ri,"%@ #%s",
                link->cc->errstr);
            instanceLinkCloseConnection(link,link->cc);
        } else {
            link->pending_commands = 0;
            link->cc_conn_time = mstime();
            link->cc->data = link;
            redisAeAttach(server.el,link->cc);
            redisAsyncSetConnectCallback(link->cc,
                    sentinelLinkEstablishedCallback);
            redisAsyncSetDisconnectCallback(link->cc,
                    sentinelDisconnectCallback);
            sentinelSendAuthIfNeeded(ri,link->cc);
            sentinelSetClientName(ri,link->cc,"cmd");

            /* Send a PING ASAP when reconnecting. */
            sentinelSendPing(ri);
        }
    }
    /* Pub / Sub */
    if ((ri->flags & (SRI_MASTER|SRI_SLAVE)) && link->pc == NULL) {
        link->pc = redisAsyncConnectBind(ri->addr->ip,ri->addr->port,NET_FIRST_BIND_ADDR);
        if (link->pc->err) {
            sentinelEvent(LL_DEBUG,"-pubsub-link-reconnection",ri,"%@ #%s",
                link->pc->errstr);
            instanceLinkCloseConnection(link,link->pc);
        } else {
            int retval;

            link->pc_conn_time = mstime();
            link->pc->data = link;
            redisAeAttach(server.el,link->pc);
            redisAsyncSetConnectCallback(link->pc,
                    sentinelLinkEstablishedCallback);
            redisAsyncSetDisconnectCallback(link->pc,
                    sentinelDisconnectCallback);
            sentinelSendAuthIfNeeded(ri,link->pc);
            sentinelSetClientName(ri,link->pc,"pubsub");
            /* Now we subscribe to the Sentinels "Hello" channel. */
            retval = redisAsyncCommand(link->pc,
                sentinelReceiveHelloMessages, ri, "%s %s",
                sentinelInstanceMapCommand(ri,"SUBSCRIBE"),
                SENTINEL_HELLO_CHANNEL);
            if (retval != C_OK) {
                /* If we can't subscribe, the Pub/Sub connection is useless
                 * and we can simply disconnect it and try again. */
                instanceLinkCloseConnection(link,link->pc);
                return;
            }
        }
    }
    ......
}

连接过程&通信

  • sentinel 通过配置文件 master 的链接信息,成链接 master(真正的连接见👆),像+monitor 频道发布消息quorum
  • 连接成功后 ASAP发送 PING
  • sentinel 向 master 发送 INFO 命令,获取 master 上的 slave 名单
  • 由于sentinel是多对多的连接方式,引入pub-sub模式处理信息交互
  • sentinel 向 master/slave 订阅了 sentinel:hello 频道,当其它sentinel节点向 master/slave 发布消息时,订阅者也能被通知,所以当前 sentinel 也能收到其它 sentinel 的信息,并进行链接
struct redisCommand sentinelcmds[] = {
    {"ping",pingCommand,1,"",0,NULL,0,0,0,0,0},
    {"sentinel",sentinelCommand,-2,"",0,NULL,0,0,0,0,0},
    {"subscribe",subscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"unsubscribe",unsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"psubscribe",psubscribeCommand,-2,"",0,NULL,0,0,0,0,0},
    {"punsubscribe",punsubscribeCommand,-1,"",0,NULL,0,0,0,0,0},
    {"publish",sentinelPublishCommand,3,"",0,NULL,0,0,0,0,0},
    {"info",sentinelInfoCommand,-1,"",0,NULL,0,0,0,0,0},
    {"role",sentinelRoleCommand,1,"l",0,NULL,0,0,0,0,0},
    {"client",clientCommand,-2,"rs",0,NULL,0,0,0,0,0},
    {"shutdown",shutdownCommand,-1,"",0,NULL,0,0,0,0,0},
    {"auth",authCommand,2,"sltF",0,NULL,0,0,0,0,0}
};

以上是sentinel端使用的命令

例如sentinel命令,处理函数 sentinelCommand

命令处理

processCommand,详细见前序文章 Redis 事件处理

Sentinel 故障处理

sentinelTimer -> sentinelHandleRedisInstance定时管理节点

过程如下:

  • 重连 sentinelHandleRedisInstance 会调用 sentinelReconnectInstance 函数,尝试和断连的实例重新建立连接

  • 心跳 sentinelSendPeriodicCommands 向实例发送 PING、INFO 等命令

  • 判断主观下线 sentinelCheckSubjectivelyDown

  • 判断客观下线 sentinelCheckObjectivelyDown

  • 故障处理 sentinelStartFailoverIfNeeded

    • 如果要启动故障切换,就调用 sentinelAskMasterStateToOtherSentinels 函数,获取其他哨兵对主节点状态的判断,并向其他哨兵发送 is-master-down-by-addr 命令,发起 Leader 选举
    • sentinelFailoverStateMachine执行故障切换
    • 再次调用 sentinelAskMasterStateToOtherSentinels 函数,获取其他哨兵实例对主节点状态的判断
/* Perform scheduled operations for the specified Redis instance. */
void sentinelHandleRedisInstance(sentinelRedisInstance *ri) {
    /* ========== MONITORING HALF ============ */
    /* Every kind of instance */
    sentinelReconnectInstance(ri);
    sentinelSendPeriodicCommands(ri);

    /* ============== ACTING HALF ============= */
    /* We don't proceed with the acting half if we are in TILT mode.
     * TILT happens when we find something odd with the time, like a
     * sudden change in the clock. */
    if (sentinel.tilt) {
        if (mstime()-sentinel.tilt_start_time < SENTINEL_TILT_PERIOD) return;
        sentinel.tilt = 0;
        sentinelEvent(LL_WARNING,"-tilt",NULL,"#tilt mode exited");
    }

    /* Every kind of instance */
    sentinelCheckSubjectivelyDown(ri);

    /* Masters and slaves */
    if (ri->flags & (SRI_MASTER|SRI_SLAVE)) {
        /* Nothing so far. */
    }

    /* Only masters */
    if (ri->flags & SRI_MASTER) {
        sentinelCheckObjectivelyDown(ri);
        if (sentinelStartFailoverIfNeeded(ri))
            sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_ASK_FORCED);
        sentinelFailoverStateMachine(ri);
        sentinelAskMasterStateToOtherSentinels(ri,SENTINEL_NO_FLAGS);
    }
}

故障发现

心跳

sentinelSendPeriodicCommands 函数负责所有节点sentinel/master/slave的心跳

  • 调用redisAsyncCommand

    • 先向masters/slaves发送 INFO 命令,获取信息

      • 发现节点信息变更,同步新的节点属性信息
      • 如果在 master 的回复文本中发现新的 slave,进行链接建立联系
      • 节点角色改变,进行故障转移或其它相关的逻辑
    • 再向三种类型节点发送PING

      • 三个角色之间通过发送 PING 作为心跳,确认对方是否在线

主管下线判断

sentinelCheckSubjectivelyDown 函数 检查所有节点类型 sentinel/master/slave,是否主观下线

  • 计算间隔elapsed

  • 关闭超时连接 代码略

  • 超时判断

    • elapsed>down_after_period 阈值
    • sentinel 认为当前master是主节点,但是这个节点向sentinel报告它将成为slave,并且在 down_after_period 时长,再加上两个 INFO 命令间隔后,该节点还是没有转换成功
  • 两个条件有一个满足时,sentinel就判定主节点为主观下线了

  • 调用 sentinelEvent 函数发送“+sdown”事件信息

/* Is this instance down from our point of view? */
void sentinelCheckSubjectivelyDown(sentinelRedisInstance *ri) {
    mstime_t elapsed = 0;

    if (ri->link->act_ping_time)
        elapsed = mstime() - ri->link->act_ping_time;//距离上一次发送PING的间隔
    else if (ri->link->disconnected)
        elapsed = mstime() - ri->link->last_avail_time;//距离最后可用的间隔

    ......

    /* Update the SDOWN flag. We believe the instance is SDOWN if:
     *
     * 1) It is not replying.
     * 2) We believe it is a master, it reports to be a slave for enough time
     *    to meet the down_after_period, plus enough time to get two times
     *    INFO report from the instance. */
    if (elapsed > ri->down_after_period ||
        (ri->flags & SRI_MASTER &&
         ri->role_reported == SRI_SLAVE &&
         mstime() - ri->role_reported_time >
          (ri->down_after_period+SENTINEL_INFO_PERIOD*2)))
    {
        /* Is subjectively down */
        if ((ri->flags & SRI_S_DOWN) == 0) {
            sentinelEvent(LL_WARNING,"+sdown",ri,"%@");
            ri->s_down_since_time = mstime();
            ri->flags |= SRI_S_DOWN;
        }
    } else {
        /* Is subjectively up */
        if (ri->flags & SRI_S_DOWN) {
            sentinelEvent(LL_WARNING,"-sdown",ri,"%@");
            ri->flags &= ~(SRI_S_DOWN|SRI_SCRIPT_KILL_SENT);
        }
    }
}

客观下线判断

sentinelCheckObjectivelyDown 函数,来检测主节点是否客观下线

客观下线需要全部sentinel 认可,所以这里分层两部分秒速

  • 当前sentinel(发现master出现问题的sentinel)

    • 使用 quorum 变量,来记录判断主节点为主观下线的sentinel数量
    • 如果当前sentinel已经判断主节点为主观下线,那么它会先把 quorum 值置为 1
    • 然后,它会依次判断其他sentinel的 flags 变量,检查是否设置了 SRI_MASTER_DOWN 的标记
    • 如果设置了,它就会把 quorum 值加 1
    • 当遍历完 sentinels 哈希表后,sentinelCheckObjectivelyDown 函数会判断 quorum 值是否大于等于预设定的 quorum 阈值(sentinel.conf 配置)
    • quorum >= master->quorum 设置 odown 客观下线
    • 调用 sentinelEvent 函数发送 +odown 事件消息
/* Is this instance down according to the configured quorum?
 *
 * Note that ODOWN is a weak quorum, it only means that enough Sentinels
 * reported in a given time range that the instance was not reachable.
 * However messages can be delayed so there are no strong guarantees about
 * N instances agreeing at the same time about the down state. */
void sentinelCheckObjectivelyDown(sentinelRedisInstance *master) {
    ......
    if (master->flags & SRI_S_DOWN) {
        /* Is down for enough sentinels? */
        quorum = 1; /* the current sentinel. */
        /* Count all the other sentinels. */
        di = dictGetIterator(master->sentinels);
        while((de = dictNext(di)) != NULL) {
            sentinelRedisInstance *ri = dictGetVal(de);

            if (ri->flags & SRI_MASTER_DOWN) quorum++;
        }
        dictReleaseIterator(di);
        if (quorum >= master->quorum) odown = 1;
    }
    ......
}

其他sentinel的 flags 变量如何获得呢?

sentinelAskMasterStateToOtherSentinels 函数,调用redisAsyncCommand

向其他sentinel 发送 is-master-down-by-addr命令 设置回调函数sentinelReceiveIsMasterDownReply

sentinelReceiveIsMasterDownReply接收回复消息,设置 其他sentinel SRI_MASTER_DOWN

  • 其他sentinel

接收👆sentinel发送过来了is-master-down-by-addr执行判断

sentinelCommand执行命令:

  1. 👆的 sentinel 如果没有处于异常保护状态,而且也检测到询问的 master 已经主观下线了(定时检测)
  2. 发送回复给当前sentinel👆确认SRI_MASTER_DOWN
if (!strcasecmp(c->argv[1]->ptr,"is-master-down-by-addr")) {
        /* SENTINEL IS-MASTER-DOWN-BY-ADDR <ip> <port> <current-epoch> <runid>
         *
         * Arguments:
         *
         * ip and port are the ip and port of the master we want to be
         * checked by Sentinel. Note that the command will not check by
         * name but just by master, in theory different Sentinels may monitor
         * differnet masters with the same name.
         *
         * current-epoch is needed in order to understand if we are allowed
         * to vote for a failover leader or not. Each Sentinel can vote just
         * one time per epoch.
         *
         * runid is "*" if we are not seeking for a vote from the Sentinel
         * in order to elect the failover leader. Otherwise it is set to the
         * runid we want the Sentinel to vote if it did not already voted.
         */
        sentinelRedisInstance *ri;
        long long req_epoch;
        uint64_t leader_epoch = 0;
        char *leader = NULL;
        long port;
        int isdown = 0;

        if (c->argc != 6) goto numargserr;
        if (getLongFromObjectOrReply(c,c->argv[3],&port,NULL) != C_OK ||
            getLongLongFromObjectOrReply(c,c->argv[4],&req_epoch,NULL)
                                                              != C_OK)
            return;
        ri = getSentinelRedisInstanceByAddrAndRunID(sentinel.masters,
            c->argv[2]->ptr,port,NULL);

        /* It exists? Is actually a master? Is subjectively down? It's down.
         * Note: if we are in tilt mode we always reply with "0". */
        if (!sentinel.tilt && ri && (ri->flags & SRI_S_DOWN) &&
                                    (ri->flags & SRI_MASTER))
            isdown = 1;

        ......
        /* Reply with a three-elements multi-bulk reply:
         * down state, leader, vote epoch. */
        addReplyMultiBulkLen(c,3);
        addReply(c, isdown ? shared.cone : shared.czero);
        addReplyBulkCString(c, leader ? leader : "*");
        addReplyLongLong(c, (long long)leader_epoch);
        if (leader) sdsfree(leader);

故障处理

sentinelCheckObjectivelyDown 观察到主管下线后

调用 sentinelStartFailoverIfNeeded 函数判断是否进行故障处理

切换条件:

  • master flags 已经标记了 SRI_O_DOWN
  • 当前没有在执行故障切换
  • 如果已经开始故障切换,那么开始时间距离当前时间,需要超过 sentinel.conf 中 sentinel failover-timeout 的 2 倍

满足上诉条件后,触发 sentinelStartFailover,设置状态

  • master->failover_epoch 用于后续投票
  • master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START
  • master->failover_start_time 用于方式多节点同时发起故障投票

接下来继续调用 sentinelAskMasterStateToOtherSentinels 发起leader投票 命令

/* Setup the master state to start a failover. */
void sentinelStartFailover(sentinelRedisInstance *master) {
    serverAssert(master->flags & SRI_MASTER);

    master->failover_state = SENTINEL_FAILOVER_STATE_WAIT_START;
    master->flags |= SRI_FAILOVER_IN_PROGRESS;
    master->failover_epoch = ++sentinel.current_epoch;//current_epoch增加
    sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
        (unsigned long long) sentinel.current_epoch);
    sentinelEvent(LL_WARNING,"+try-failover",master,"%@");
    master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
    master->failover_state_change_time = mstime();
}

void sentinelCommand(client *c) {
    ......
/* Vote for the master (or fetch the previous vote) if the request
         * includes a runid, otherwise the sender is not seeking for a vote. */
        if (ri && ri->flags & SRI_MASTER && strcasecmp(c->argv[5]->ptr,"*")) {
            leader = sentinelVoteLeader(ri,(uint64_t)req_epoch,
                                            c->argv[5]->ptr,
                                            &leader_epoch);
        }
        /* Reply with a three-elements multi-bulk reply:
         * down state, leader, vote epoch. */
        addReplyMultiBulkLen(c,3);
        addReply(c, isdown ? shared.cone : shared.czero);
        addReplyBulkCString(c, leader ? leader : "*");
        addReplyLongLong(c, (long long)leader_epoch);
        if (leader) sdsfree(leader);
    ......
}

Leader Election

为什么在failover前需要发起投票呢? 这是为了保证只有一个sentinel执行这个工作:sentinelCommand执行投票命令,调用sentinelVoteLeader

投票流程

  • 当前sentinelA发起投票,发送自己的req_epoch和runid

  • 其它 sentinel 节点,接收到拉票信息,进行投票

  • 如果req_epoch > sentinel自己的epoch,同步数据sentinel.current_epoch = req_epoch

  • 如果当前sentinel记录 记录的master->leader_epoch< req_epoch且sentinel自己的epoch<=req_epoch 回复投票给req_epoch

    • 发送 +vote-for-leader 订阅
  • 否则的话,sentinelVoteLeader 函数就直接返回当前 master 中记录的 Leader ID 了,这也是当前sentinel之前投过票后记录下来的

  • 发起投票的sentinelA通过 sentinelReceiveIsMasterDownReply 函数来处理其他sentinel对 Leader 投票的返回结果,便于后面统计票数

    • sentinel leader
    • sentinel req_runid
    • leader_epoch
  • 最后通过sentinelFailoverWaitStart函数来判断发起投票的sentinelA是否为leader

    必须满足两个条件:

    • 获得超过半数的其他sentinel的赞成票
    • 获得超过预设的 quorum 阈值的赞成票数
  • 满足这两个条件sentinelA 变成为leader,不满足,继续投票流程

    • sentinelGetLeader函数完成

      • 先统计其他sentinel投票结果,满足上面两个条件设置winner
      • 如果winner不是我,大概率会认为他会赢,投票给他
      • 否则投给自己
      • 再此判断是否满足👆两个条件,满足winner
/* Vote for the sentinel with 'req_runid' or return the old vote if already
 * voted for the specified 'req_epoch' or one greater.
 *
 * If a vote is not available returns NULL, otherwise return the Sentinel
 * runid and populate the leader_epoch with the epoch of the vote. */
char *sentinelVoteLeader(sentinelRedisInstance *master, uint64_t req_epoch, char *req_runid, uint64_t *leader_epoch) {
    if (req_epoch > sentinel.current_epoch) {
        sentinel.current_epoch = req_epoch;
        sentinelFlushConfig();
        sentinelEvent(LL_WARNING,"+new-epoch",master,"%llu",
            (unsigned long long) sentinel.current_epoch);
    }

    if (master->leader_epoch < req_epoch && sentinel.current_epoch <= req_epoch)
    {
        sdsfree(master->leader);
        master->leader = sdsnew(req_runid);
        master->leader_epoch = sentinel.current_epoch;
        sentinelFlushConfig();
        sentinelEvent(LL_WARNING,"+vote-for-leader",master,"%s %llu",
            master->leader, (unsigned long long) master->leader_epoch);
        /* If we did not voted for ourselves, set the master failover start
         * time to now, in order to force a delay before we can start a
         * failover for the same master. */
        if (strcasecmp(master->leader,sentinel.myid))
            master->failover_start_time = mstime()+rand()%SENTINEL_MAX_DESYNC;
    }

    *leader_epoch = master->leader_epoch;
    return master->leader ? sdsnew(master->leader) : NULL;
}

/* Scan all the Sentinels attached to this master to check if there
 * is a leader for the specified epoch.
 *
 * To be a leader for a given epoch, we should have the majority of
 * the Sentinels we know (ever seen since the last SENTINEL RESET) that
 * reported the same instance as leader for the same epoch. */
char *sentinelGetLeader(sentinelRedisInstance *master, uint64_t epoch) {
   ......
    /* Count other sentinels votes */
    di = dictGetIterator(master->sentinels);
    while((de = dictNext(di)) != NULL) {
        sentinelRedisInstance *ri = dictGetVal(de);
        if (ri->leader != NULL && ri->leader_epoch == sentinel.current_epoch)
            sentinelLeaderIncr(counters,ri->leader);
    }
    dictReleaseIterator(di);

    /* Check what's the winner. For the winner to win, it needs two conditions:
     * 1) Absolute majority between voters (50% + 1).
     * 2) And anyway at least master->quorum votes. */
    di = dictGetIterator(counters);
    while((de = dictNext(di)) != NULL) {
        uint64_t votes = dictGetUnsignedIntegerVal(de);

        if (votes > max_votes) {
            max_votes = votes;
            winner = dictGetKey(de);
        }
    }
    dictReleaseIterator(di);

    /* Count this Sentinel vote:
     * if this Sentinel did not voted yet, either vote for the most
     * common voted sentinel, or for itself if no vote exists at all. */
    if (winner)
        myvote = sentinelVoteLeader(master,epoch,winner,&leader_epoch);
    else
        myvote = sentinelVoteLeader(master,epoch,sentinel.myid,&leader_epoch);

    if (myvote && leader_epoch == epoch) {
        uint64_t votes = sentinelLeaderIncr(counters,myvote);

        if (votes > max_votes) {
            max_votes = votes;
            winner = myvote;
        }
    }

    voters_quorum = voters/2+1;
    if (winner && (max_votes < voters_quorum || max_votes < master->quorum))
        winner = NULL;

    winner = winner ? sdsnew(winner) : NULL;
    sdsfree(myvote);
    dictRelease(counters);
    return winner;
}

总结一下:

  • 比较req_epoch和sentinel本地master->leader_epoch

  • 半数且超过quorum 阈值的赞成票数

  • 随机性发起投票 - 每个sentinel节点serverCron 中调用 sentinelTimer 函数,增加随机性(类似于raft)

    • server.hz = CONFIG_DEFAULT_HZ + rand() % CONFIG_DEFAULT_HZ;
  • 先下手为强,非最先发起投票的sentinel节点设置 master->failover_start_time 需要有一定的时间间隔 :master->failover_timeout*2 再发起投票

  • Election超时控制 election_timeout

Failover

sentinelFailoverStateMachine 执行故障切换函数,由sentinel 本轮leader完成(👆的得票leader)

  • SRI_FAILOVER_IN_PROGRESS 说明faiover正在进行,返回(状态机)

  • 如果是failover_state(sentinelStartFailover函数里设置) 开启failover流程

    • SENTINEL_FAILOVER_STATE_WAIT_START 等待投票结果
    • SENTINEL_FAILOVER_STATE_SELECT_SLAVE 选择最优slave to promote - 以下条件之一不存在:S_DOWN、O_DOWN、DISCONNECTED (slave连接保持其未被发现为主客观下线) - 从节点最后一次回复 ping 的时间不超过 PING 周期的 5 倍 (sentinel ping) - info_refresh 的时间不早于 INFO 刷新周期的 3 倍 (sentinel info) - master_link_down_time 不超过: (now - master->s_down_since_time) + (master->down_after_period * 10)。 基本上,主服务器已经宕机,从节点报告断开的时间不会超过配置的 down-after-period 的 10 倍 黑魔法 - 从节点优先级不能为零,否则将丢弃该从节点 - 在所有符合上述条件的从节点中,我们按以下排序关键字的顺序选择从节点 - 高优先级 salve slave_priority - 较大的已处理复制偏移量 - 较小的 runid(防止前两项相同)
  • sentinelFailoverSendSlaveOfNoOne 提升👆选择的最优 slave 为新master

  • SENTINEL_FAILOVER_STATE_WAIT_PROMOTION 等待最优 slave 成功晋升,slave->master role改变

  • SENTINEL_FAILOVER_STATE_RECONF_SLAVES 通知slaves 连接新 master

  • SENTINEL_FAILOVER_STATE_UPDATE_CONFIG slave 成功晋升 master 后,更新 master <--> slave 的数据结构关系

更多代码不详细展开

/* Failover machine different states. */
#define SENTINEL_FAILOVER_STATE_NONE 0  /* No failover in progress. */
#define SENTINEL_FAILOVER_STATE_WAIT_START 1  /* Wait for failover_start_time*/
#define SENTINEL_FAILOVER_STATE_SELECT_SLAVE 2 /* Select slave to promote */
#define SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE 3 /* Slave -> Master */
#define SENTINEL_FAILOVER_STATE_WAIT_PROMOTION 4 /* Wait slave to change role */
#define SENTINEL_FAILOVER_STATE_RECONF_SLAVES 5 /* SLAVEOF newmaster */
#define SENTINEL_FAILOVER_STATE_UPDATE_CONFIG 6 /* Monitor promoted slave. */
void sentinelFailoverStateMachine(sentinelRedisInstance *ri) {
    serverAssert(ri->flags & SRI_MASTER);

    if (!(ri->flags & SRI_FAILOVER_IN_PROGRESS)) return;

    switch(ri->failover_state) {
        case SENTINEL_FAILOVER_STATE_WAIT_START:
            sentinelFailoverWaitStart(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SELECT_SLAVE:
            sentinelFailoverSelectSlave(ri);
            break;
        case SENTINEL_FAILOVER_STATE_SEND_SLAVEOF_NOONE:
            sentinelFailoverSendSlaveOfNoOne(ri);
            break;
        case SENTINEL_FAILOVER_STATE_WAIT_PROMOTION:
            sentinelFailoverWaitPromotion(ri);
            break;
        case SENTINEL_FAILOVER_STATE_RECONF_SLAVES:
            sentinelFailoverReconfNextSlave(ri);
            break;
    }
}

参考

Redis文档

Redis 5.0.1 source code