A member of raft is paxso family,
Paxso is the premise of no evil nodes, network partition, in this case, ensure consistency, in the case of more than half node can exchange with availability, node is not available in more than half (dead) or network partitions cannot produce under the condition of the majority, or majority agent (primary) death cause is not available,
All rounded areas represent the cluster nodes, blue point says the master node, the green areas represent when the majority of time,
W1, w2, w3 is written request from outside the cluster users, this is normal case, raft happened in the cluster,
Is, the state machine in t3 moments after the execution of w1 w2 w3 sequence, after
Cluster nodes number 2 f + 1, this number is in the form of saving resources, but also convenient to write, in fact, the number of cluster nodes can be any number greater than 2,
Ensure that only a written request sequences exist, is the goal of the agreement,
If the cluster can be falsified into a single logical node, then the logical nodes written request sequences was the only,
Majority is the logical nodes, each write request corresponding to a majority, a messaging, reply more than f + 1, which is a majority, visible, majority is changing,
The only requirement is exclusive of majority (unique), while a majority, the cluster can no longer produce another majority, above circle, cut into as many pieces, only up to an area of more than half, arbitrary two cutting produced by more than half area there will be overlap,
Red on the drawing part of the node, by spreading news, can know they can't form the majority, each node in the cluster in advance know the total number of cluster nodes, raft supports dynamic node, increase or decrease
The master node is simply the majority of an agent, in order to facilitate sends a written request to the majority (if the master node must obviously only, because the majority is the only, the majority of agents will also be the only)
In theory, the existence of the majority, and can be distributed to most of the written request, can be used (availability), majority does not exist, is not available,
Need to majority exist, so the availability and have entrance to the request to the majority, the entrance is the master node in the raft, the majority of agents,
If the majority does not exist, is not available, must wait until the majority appear to is available, but the entrance is dead, can change an entry, is still available, this is the two is not available,
, therefore, the master node is not particularity, the requirement is that the master node is the only, quickly find a single node (primary), raft with multiple random counter to 0 first (regardless of the network transmission differences),
Network transmission unreliable, stored in memory is not reliable, the two unreliable corresponding to the two stage, (disk, reliable and corresponding finish or submitted)
Completed by majority in each stage (each stage in the majority, just calculate the phase completed), in theory the two stage can synthesize into a big stage, but there is only the first stage, so you need to back to the first phase of the movement, more complex, so the two steps above is the minimum steps,
Request wi to enter the master node agent (majority), the master node for wi launched the first stage, the second phase, assuming that the master node in the two stages of wi is alive, as long as phase one enters the majority, then phase 2 into the majority, write requests w2,
At any stage if unable to appear the majority, only when the majority appear to end, this is the place where sacrifice usability, multiple network partitions, more nodes will lead to the death of a majority,
Assuming that the master node, dead in the first stage of the wi, after several times of random counter to 0 will appear again the winner (here only need to be the only way to get to the node, this approach does not necessarily have to) contract winners as the heart is the master node, this case discarded wi has no side effects,
Assuming that the master node in wi to die after completion of the first phase of the second stage, later still will have a new master node, this case discarded wi has no side effects (actually within short time may be able to take it out from the majority wi phase has been completed, at this time in memory, and continue from here, but not meaning),
Assume that after completion of the master node in the first stage of the wi after completion of the second stage is not to die client response, and then will still have the new master node, this case wi has entered the majority of the disk (so wi can actually get out from the majority), if you choose to discard wi must be rolled back wi majority, for the sake of simplicity, should choose not to action, then from wi to go to the client the no response is received, but it has been done wi, the client can know this wi success behind,
New master node, in fact just has uniqueness, its storage write requests is not necessarily a majority of the store and now sequences, so the new master node before work can still be pulled from the majority majority storage sequence of write requests,
Written on a majority holding the latest request sequences, the majority can't eliminate the last majority in all nodes and become a majority, so the majority of must have request sequences in the last majority (rule is that the latest request serial number (term rounds + request serial number)),
Implied here, the majority of each node is the premise of honest (loyalty agreement),
Term number is the serial number of the master node do of a constant, a master node may have a number of written request, during normal working term number of the tag is in the written request, is used to quickly compare two complete written request sequences is different: if the term is different, is certainly two write sequence must be different,