# Dgraph: Synchronously Replicated, Transactional and Distributed Graph Database (Paper)

https://dgraph.io/paper

• Two kinds of nodes:
• Zeroes are administrative nodes that handle things like:
• Handing out monotonic timestamps
• Handle rebalances/reshards
• Alphas store data
• Nodes are divided into groups; one group for zeroes, and 1..N groups for alphas.
• Each group forms a Raft cluster and can have 1, 3, or 5 replicas.
• Primitive data structure is a triple, representing either:
• (subject, predicate, object) → (0xab, <follower_of>, 0xff)
• (subject, predicate, value) → (0xab, <name>, "John")
• All subjects are given a globally unique uid
• Data is sharded by relationship/predicate rather than by entity.
• For example, if your data store contains information about which user follows which other user, users are entities, and “follows” is the relationship
• Dgraph places all data for the “follows” relationship (across all entities) in a single group.
• A group can contain more than one relationship.
• Critically, what happens when a relationship can’t fit on a single node anymore?
• Dgraph uses Badger (in-house LSM KV store) to persist data; metadata is stored in Raft.
• The paper glosses over this, but how does Dgraph ensure that Badger and Raft are in sync?
• Uses GraphQL for queries, over GRPC/Protobuf or vanilla HTTP.
• All records with the same subject and predicate are grouped into the same Badger key:
• The Badger value is a sorted list of uids/values, and can be split if a single KV-pair becomes too large.
• Dgraph can execute rebalances by marking shards read-only; are writes rejected during this process?
• Supports lock-free transactions via MVCC
• The entire Lock-Free High Availability Transaction Processing was impenetrable to me
• Lots of jargon and not much elucidation