Target: Connect physically seperated machines
--> sharing --> incresing capacity through parallelism --> tolerate faults --> archieve security via isolate
- Local area networks (1980s)
- Data centers, Big web sites (1990s)
- such as: web search, shopping
- ?: a lot of data, and a lot of users
- Cloud computing (2000s)
- move computation & data to cloud server
- ?: start websites on personal willing, data learning apps, high performance data
- actual problems
- booming field
- Many concurrent parts
- must be deal with partial failure -> complexity
- tricky realise the performance benefits
Lab 1: distributed big-data framework (like MapReduce)
Lab 2: fault tolerance library using replication (Raft)
Lab 3: a simple fault-tolerant database
Lab 4: scalable database performance via sharing
This is a course about infrastructure for applications.
- make systems highly availably, no matter what status they are being
- recoverability, recover previous state
General-purpose infrastructure needs well-defined behavior.
E.g. “Get(k) yields the value from the most recent Put(k,v).”
Achieving good behavior is hard!
“Replica” servers are hard to keep identical.
Replication improves the availability, but lose performance
This material comes up a lot in the real world.
All big web sites and cloud providers are expert at distributed systems.
Many big open source projects are built around these ideas.
We’ll read multiple papers from industry.
And industry has adopted many ideas from academia