# Distributed Systems

Last updated May 14, 2022

A distributed system can be defined as multiple computers (nodes) communicating via a network trying to achieve some task together.

## # Martin Kleppmann’s Course

Notes from Martin Kleppmann’s Distributed Systems Course (he’s the guy who wrote “Designing Data-Intensive Applications” O’Reilly 2017). He has a set of course notes on his teaching site as well.

How do we share data amongst different concurrent entities?

• “Distributed Systems” by van Steen & Tanenbaum: Implementation detail heavy, more practical
• “Introduction to Reliable and Secure Distributed Programs” (2nd ed) by Cachin, Guerraoui & Rodrigues: Theory heavy
• “Designing Data-Intensive Applications” by Kleppmann: More oriented toward distributed databases
• “Operating Systems: Concurrent and Distributed Software Design” by Addison-Wesley: links to Operating Systems

### # Why distributed?

• Things are inherently distributed: sending a message from your phone to your friend’s phone
• Reliability: even if one node fails, the system as a whole keeps functioning
• Performance: get data from a nearby node rather than one centralized server halfway around the world
• Solve bigger problems: some amounts of data can’t fit on just one machine

### # Why not distributed?

• Communication may fail (and we might not even know it has failed)
• Processes may crash (and we might not know)
• All of this can happen nondeterministically
• Thus we need to think about fault tolerance