jzhao.xyz

Recent Writing

2024: Centering
Dec 23, 2024
Taste is a guide for what is worthwhile
Jan 14, 2024
Agentic Computing
Nov 29, 2022
Building a BFT JSON CRDT
Nov 16, 2022

See 21 more →

Recent Notes

TrueTime
May 26, 2025
Concurrency control
May 26, 2025

See 735 more →

Distributed Systems

May 03, 20222 min read

seed

A distributed system can be defined as multiple computers (nodes) communicating via a network trying to achieve some task together.

Martin Kleppmann’s Course

Notes from Martin Kleppmann’s Distributed Systems Course. He has a set of course notes on his teaching site as well.

How do we share data amongst different concurrent entities?

Recommended Reading
- “Distributed Systems” by van Steen & Tanenbaum: Implementation detail heavy, more practical
- “Introduction to Reliable and Secure Distributed Programs” (2nd ed) by Cachin, Guerraoui & Rodrigues: Theory heavy
- “Designing Data-Intensive Applications” by Kleppmann: More oriented toward distributed databases
- “Operating Systems: Concurrent and Distributed Software Design” by Addison-Wesley: links to Operating Systems

Why distributed?

Things are inherently distributed: sending a message from your phone to your friend’s phone
Reliability: even if one node fails, the system as a whole keeps functioning
Performance: get data from a nearby node rather than one centralized server halfway around the world
Solve bigger problems: some amounts of data can’t fit on just one machine

Why not distributed?

Communication may fail (and we might not even know it has failed)
Processes may crash (and we might not know)
All of this can happen nondeterministically
Thus we need to think about fault tolerance

Notes

RPCs
Fault Tolerance
- See Two Generals Problem and Byzantine Generals Problem
System models
Physical and Logical Time
Message ordering and Causality
Message broadcast
Replication
Quorum
Consensus
Consistency

Recent Writing

2024: Centering
Dec 23, 2024
Taste is a guide for what is worthwhile
Jan 14, 2024
Agentic Computing
Nov 29, 2022
Building a BFT JSON CRDT
Nov 16, 2022

See 21 more →

Recent Notes

TrueTime
May 26, 2025
Concurrency control
May 26, 2025

See 735 more →

Graph View

Martin Kleppmann’s Course
Why distributed?
Why not distributed?
Notes

Backlinks

Building a BFT JSON CRDT
Here's to the fools who dream
A Certain Tendency Of The Database Community
Overlay Network
Rhizome Research Log
Distributed Web
Fault Tolerance
Liveness
Math
Safety
Software Principles
System model
Write-ahead Logging

Created with Quartz v4.5.1 © 2025

GitHub
Twitter