This pattern is part of Patterns of Distributed Systems
Generation Clock
A monotonically increasing number indicating the generation of the server.
04 August 2020
aka: Term, Epoch, and Generation
Problem
In Leader and Followers setup, there is a possibility of the leader being temporarily disconnected from the followers. There might be a garbage collection pause in the leader process, or a temporary network disruption which disconnects the leader from the follower. In this case the leader process is still running, and after the pause or the network disruption is over, it will try sending replication requests to the followers. This is dangerous, as meanwhile the rest of the cluster might have selected a new leader and accepted requests from the client. It is important for the rest of the cluster to detect any requests from the old leader. The old leader itself should also be able to detect that it was temporarily disconnected from the cluster and take necessary corrective action to step down from leadership.
Solution
Maintain a monotonically increasing number indicating the generation of the server. Every time a new leader election happens, it should be marked by increasing the generation. The generation needs to be available beyond a server reboot, so it is stored with every entry in the Write-Ahead Log. As discussed in High-Water Mark, followers use this information to find conflicting entries in their log.
At startup, the server reads the last known generation from the log.
class ReplicatedLog…
this.replicationState = new ReplicationState(config, wal.getLastLogEntryGeneration());
With Leader and Followers servers increment the generation every time there's a new leader election.
class ReplicatedLog…
private void startLeaderElection() {
replicationState.setGeneration(replicationState.getGeneration() + 1);
registerSelfVote();
requestVoteFrom(followers);
}
The servers send the generation to other servers as part of the vote requests. This way, after a successful leader election, all the servers have the same generation. Once the leader is elected, followers are told about the new generation
follower (class ReplicatedLog...)
private void becomeFollower(int leaderId, Long generation) { replicationState.reset(); replicationState.setGeneration(generation); replicationState.setLeaderId(leaderId); transitionTo(ServerRole.FOLLOWING); }
Thereafter, the leader includes the generation in each request it sends to the followers. It includes it in every HeartBeat message as well as the replication requests sent to followers.
Leader persists the generation along with every entry in its Write-Ahead Log
leader (class ReplicatedLog...)
Long appendToLocalLog(byte[] data) { Long generation = replicationState.getGeneration(); return appendToLocalLog(data, generation); } Long appendToLocalLog(byte[] data, Long generation) { var logEntryId = wal.getLastLogIndex() + 1; var logEntry = new WALEntry(logEntryId, data, EntryType.DATA, generation); return wal.writeEntry(logEntry); }
This way, it is also persisted in the follower log as part of the replication mechanism of Leader and Followers
If a follower gets a message from a deposed leader, the follower can tell because its generation is too low. The follower then replies with a failure response.
follower (class ReplicatedLog...)
Long currentGeneration = replicationState.getGeneration(); if (currentGeneration > request.getGeneration()) { return new ReplicationResponse(FAILED, serverId(), currentGeneration, wal.getLastLogIndex()); }
When a leader gets such a failure response, it becomes a follower and expects communication from the new leader.
Old leader (class ReplicatedLog...)
if (!response.isSucceeded()) { if (response.getGeneration() > replicationState.getGeneration()) { becomeFollower(LEADER_NOT_KNOWN, response.getGeneration()); return; }
Consider the following example. In the three server cluster, leader1 is the existing leader. All the servers in the cluster have the generation as 1. Leader1 sends continuous heartbeats to the followers. Leader1 has a long garbage collection pause, for say 5 seconds. The followers did not get a heartbeat, and timeout to elect a new leader. The new leader increments the generation to 2. After the garbage collection pause is over, leader1 continues sending the requests to other servers. The followers and the new leader which are at generation 2, reject the request and send a failure response with generation 2. leader1 handles the failure response and steps down to be a follower, with generation updated to 2.


Figure 1: Generation
Examples
Raft
Raft uses the concept of a Term for marking the leader generation.
Zab
In Zookeeper, an epoch number is maintained as part of every transaction id. So every transaction persisted in Zookeeper has a generation marked by epoch.
Cassandra
In Cassandra each server stores a generation number which is incremented every time a server restarts. The generation information is persisted in the system keyspace and propagated as part of the gossip messages to other servers. The servers receiving the gossip message can then compare the generation value it knows about and the generation value in the gossip message. If the generation in the gossip message is higher, it knows that the server was restarted and then discards all the state it has maintained for that server and asks for the new state.
Epoch's in Kafka
In Kafka an epoch number is created and stored in Zookeeper every time a new Controller is elected for a kafka cluster. The epoch is included in every request that is sent from controller to other servers in the cluster. Another epoch called LeaderEpoch is maintained to know if the followers a partition are lagging behind in their High-Water Mark.
This page is part of:
Patterns of Distributed Systems

Patterns
- Clock-Bound Wait
- Consistent Core
- Emergent Leader
- Fixed Partitions
- Follower Reads
- Generation Clock
- Gossip Dissemination
- HeartBeat
- High-Water Mark
- Hybrid Clock
- Idempotent Receiver
- Key-Range Partitions
- Lamport Clock
- Leader and Followers
- Lease
- Low-Water Mark
- Paxos
- Quorum
- Replicated Log
- Request Batch
- Request Pipeline
- Request Waiting List
- Segmented Log
- Single Socket Channel
- Singular Update Queue
- State Watch
- Two Phase Commit
- Version Vector
- Versioned Value
- Write-Ahead Log
Significant Revisions
04 August 2020: Initial publication