6
Target: Connect physically seperated machines
--> sharing
--> incresing capacity through parallelism
--> tolerate faults
--> archieve security via isolate
Historical Context
Happening:
- Local area networks (1980s)
Developing: - Data centers, Big web sites (1990s)
- such as: web search, shopping
- ?: a lot of data, and a lot of users
Next ego: - Cloud computing (2000s)
- move computation & data to cloud server
- ?: start websites on personal willing, data learning apps, high performance data
Current states: - actual problems
- booming field
Challenges
- Many concurrent parts
- must be deal with partial failure -> complexity
- tricky realise the performance benefits
Labs
Lab 1: distributed big-data framework (like MapReduce)
Lab 2: fault tolerance library using replication (Raft)
Lab 3: a simple fault-tolerant database
Lab 4: scalable database performance via sharing
Focus course and The main topics
This is a course about infrastructure for applications.
- Storage.
- Communication.
- Computation.
Main topics:
Fault tolerance
- make systems highly availably, no matter what status they are being
- replication
- recoverability, recover previous state
- logging/transactions
Consistency
General-purpose infrastructure needs well-defined behavior.
E.g. “Get(k) yields the value from the most recent Put(k,v).”
Achieving good behavior is hard!
“Replica” servers are hard to keep identical.
Performance
scalable throughput
Tradeoff
Replication improves the availability, but lose performance
Reduce latency
Implementation
This material comes up a lot in the real world.
All big web sites and cloud providers are expert at distributed systems.
Many big open source projects are built around these ideas.
We’ll read multiple papers from industry.
And industry has adopted many ideas from academia