Jepsen 9: A Fsyncing Feeling

Very very good talk (didn’t expect anything less); a couple of highlights:

  • All Jepsen “clients” run in the same JVM, because analyzing the temporal order of concurrent events after the fact is critical to finding issues.
  • Jepsen treats “failed” and “timeout” as two very different error conditions.
  • Zookeeper passed its Jepsen test; no other (advertised) database did.
  • Hazelcast has (had?) a number of very questionable primitives.
  • I know why distributed locks are a bad idea / not technically viable, but I didn’t really understand Kyle’s explanation about this (something about side effects).
  • Be very careful if you’re picking a database for production; read the documentation, and test things yourself.
  • Pick the right tradeoff for your use case; 10% writes lost in the name of efficiency might be perfectly acceptable (or even desirable).

Look Up

  • Split-brain
  • Raft paper
  • Cache line
  • Paxos papers
  • LD_PRELOAD
  • Linearizability
    • Only goes forward in time
  • Serializability
    • Goes forward/backward in time
  • Byzantine fault tolerance + its use in blockchain algorithms
  • Probabilistic execution
  • Flake IDs
  • Aerospike
  • Tendermint
  • Merkletrees
  • fsync
  • Hazelcast
  • Distributed locks
  • SIGSTOP / SIGCONTINUE
  • tc / iptables
Edit