Consensus

From Designing Data-Intensive Applications:

The best way of building fault-tolerant systems is to find some general-purpose abstractions with useful guarantees, implement them once, and then let applications rely on those guarantees. This is the same approach that transactions use: by using a transaction, the application can pretend that there are no crashes (atomicity), that nobody else is concurrently accessing the database (isola‐ tion), and that storage devices are perfectly reliable (durability). Even though crashes, race conditions, and disk failures do occur, the transaction abstraction hides those problems so that the application doesn’t need to worry about them.

One of the most important abstractions for distributed systems is consensus: that is, getting all of the nodes to agree on something. Reliably reaching consensus in spite of network faults and process failures is a surprisingly tricky problem.

Once you have an implementation of consensus, applications can use it for various purposes. For example, say you have a database with single-leader replication. If the leader dies and you need to fail over to another node, the remaining database nodes can use consensus to elect a new leader. It’s important that there is only one leader, and that all nodes agree who the leader is. If two nodes both believe that they are the leader, that situation is called split brain, and it often leads to data loss. Correct implementations of consensus help avoid such problems.

Raft

  • Notes from reading the Raft paper

  • This is a great summary of practical tradeoffs involved in picking Raft over Paxos/etc.: https://news.ycombinator.com/item?id=23123701

    The most important thing to know about Raft is that it’s not performant (every command has to be sent to a single leader which becomes a bottleneck) nor scalable (every command needs to be processed by all nodes). Etcd supports “1000s of writes” and recommends up to “7 nodes”.

    This doesn’t mean that Raft is bad; it’s just a trade-off you need to be aware of. Simplicity vs performance. If you’re integrating Raft into your stack and aim for scalability/performance you must always be very weary of when you use it. You should minimize writes at all costs. Unfortunately many developers gets the impression that you can just plug Raft into an existing system and suddenly have a performant and scalable distributed system.

    The end result is that you have two choices: (1) You can use a library which provides a simple model (a log of commands), but doesn’t scale well or (2) you can use a more complicated consensus algorithm and then deal with all of the Hard Problems™ that comes with it. If you’re going for the second option, you might as well take advantage of all of the research discovered in the last few years (see https://vadosware.io/post/paxosmon-gotta-concensus-them-all/)

  • Implementing Raft (6.824 Lab 2)

Edit