Replication

A node that has a copy of the data is called a replica

Replication is the act of ensuring consistency of data across replicas. If one replica is faulty, others are ideally still accessible

Of course, if data doesn’t change, this is an easy problem: just copy it. Hard problem is when the data changes.

Can take inspiration from hardware systems! RAID (Redundant Array of Independent Disks) which is used to replicate within a single computer fills a similar role but RAID has a single controlled whereas distributed systems have nodes that act independently.

An important concept in replication (and message broadcast) is making sure that we avoid cases where losing an ACK could lead to users doing an action multiple times (e.g. pressing the like button)?

This can be done by ensuring idempotence in our actions.

SMR

State machine replicationcCan be done by FIFO-total order broadcasting every update to all replicas. Whenever a replica delivers an update message, it applies it to its own state

This is what underlies a lot of blockchains, distributed ledgers, smart contracts, etc. (Ethereum is just one big state machine)

jzhao.xyz

Recent Writing

2024: Centering

Taste is a guide for what is worthwhile

Agentic Computing

Building a BFT JSON CRDT

Recent Notes

TrueTime

Concurrency control