6

Target: Connect physically seperated machines

--> sharing
--> incresing capacity through parallelism
--> tolerate faults
--> archieve security via isolate

Historical Context

Happening:

  • Local area networks (1980s)
    Developing:
  • Data centers, Big web sites (1990s)
  • such as: web search, shopping
  • ?: a lot of data, and a lot of users
    Next ego:
  • Cloud computing (2000s)
  • move computation & data to cloud server
  • ?: start websites on personal willing, data learning apps, high performance data
    Current states:
  • actual problems
  • booming field

Challenges

  • Many concurrent parts
  • must be deal with partial failure -> complexity
  • tricky realise the performance benefits

Labs

Lab 1: distributed big-data framework (like MapReduce)
Lab 2: fault tolerance library using replication (Raft)
Lab 3: a simple fault-tolerant database
Lab 4: scalable database performance via sharing

Focus course and The main topics

This is a course about infrastructure for applications.

  • Storage.
  • Communication.
  • Computation.

Main topics:

Fault tolerance

  • make systems highly availably, no matter what status they are being
    • replication
  • recoverability, recover previous state
    • logging/transactions

Consistency

General-purpose infrastructure needs well-defined behavior.
E.g. “Get(k) yields the value from the most recent Put(k,v).”
Achieving good behavior is hard!
“Replica” servers are hard to keep identical.

Performance

scalable throughput

Tradeoff

Replication improves the availability, but lose performance
Reduce latency

Implementation

This material comes up a lot in the real world.
All big web sites and cloud providers are expert at distributed systems.
Many big open source projects are built around these ideas.
We’ll read multiple papers from industry.
And industry has adopted many ideas from academia

Case Study: MapReduce


6
https://chivier.github.io/2023/03/08/2023/6/
Author
Chivier Humber
Posted on
March 8, 2023
Licensed under