Target: Connect physically seperated machines

--> sharing
--> incresing capacity through parallelism
--> tolerate faults
--> archieve security via isolate

Historical Context

Happening:

Local area networks (1980s)
Developing:
Data centers, Big web sites (1990s)
such as: web search, shopping
?: a lot of data, and a lot of users
Next ego:
Cloud computing (2000s)
move computation & data to cloud server
?: start websites on personal willing, data learning apps, high performance data
Current states:
actual problems
booming field

Challenges

Many concurrent parts
must be deal with partial failure -> complexity
tricky realise the performance benefits

Labs

Lab 1: distributed big-data framework (like MapReduce)
Lab 2: fault tolerance library using replication (Raft)
Lab 3: a simple fault-tolerant database
Lab 4: scalable database performance via sharing

Focus course and The main topics

This is a course about infrastructure for applications.

Storage.
Communication.
Computation.

Main topics:

Fault tolerance

make systems highly availably, no matter what status they are being
- replication
recoverability, recover previous state
- logging/transactions

Consistency

General-purpose infrastructure needs well-defined behavior.
E.g. “Get(k) yields the value from the most recent Put(k,v).”
Achieving good behavior is hard!
“Replica” servers are hard to keep identical.

Performance

scalable throughput

Tradeoff

Replication improves the availability, but lose performance
Reduce latency

Implementation

This material comes up a lot in the real world.
All big web sites and cloud providers are expert at distributed systems.
Many big open source projects are built around these ideas.
We’ll read multiple papers from industry.
And industry has adopted many ideas from academia

Case Study: MapReduce

Skill

#system #Study #distributed

https://chivier.github.io/2023/03/08/2023/6/

Author

Chivier Humber

Posted on

March 8, 2023

Licensed under

GiNac Experience Previous

我为什么从零开始新建 blog Next