PennCloud Backend

A distributed key-value store in C++ that backs a full cloud platform — webmail, file storage, and user accounts. Tablets are replicated across nodes with automatic failover, checkpoint-based recovery, and memory-bounded operation under a 10MB-per-node limit.

Fault-recovery demo (Vimeo)

Architecture

Four layers, each with a distinct responsibility:

Load Balancer (port 9000) — stateless round-robin dispatch. After the initial redirect, clients talk directly to their assigned frontend.
Frontend Servers (ports 8000–8002) — stateless HTTP servers for webmail, file storage, and accounts. Parse requests, issue KV ops against the backend, return responses.
Backend KVS Nodes (ports 10000–10008) — the store itself. Data lives in tablets (range-partitioned, e.g. A–M, M–Z), each replicated with a primary/secondary model.
Coordinator (port 9999) — heartbeat-based health monitor, primary elector, and routing authority. Single point of coordination (not a single point of failure for data).

Data model

Each tablet is a nested map: map<row, unordered_map<column, value>>. Four operations:

Op	Semantics
`PUT row,col=val`	Insert or overwrite
`GET row,col`	Read
`CPUT row,col=old,new`	Atomic compare-and-swap
`DELETE row,col`	Remove column from row

Row-level locking means concurrent writes to different rows in the same tablet don't block each other.

Replication

All writes flow through the primary:

If a client hits a secondary, the secondary forwards the write to the primary.
The primary executes locally, then sends REPLICA_WRITE to every secondary.
The primary blocks until all secondaries ACK.
Only then does the client get +OK.

This is synchronous replication — no stale reads, no lost writes under normal operation. The tradeoff is write latency scales with the slowest secondary.

Fault tolerance

The coordinator pings each node every second. If a heartbeat is missed for 2+ seconds:

Node is marked DOWN.
First available secondary is promoted to primary.
PRIMARY_CHANGE is broadcast to all replicas in the group.
Frontends are redirected away from the dead node.

When the failed node comes back, it loads its latest binary checkpoint, replays the operation log, and rejoins as a secondary.

Memory management

Each node operates under a hard 10MB memory ceiling, 8MB per tablet. Three mechanisms keep it bounded:

LRU eviction — least-recently-used tablets are checkpointed to disk and evicted when the node hits its limit.
Tablet splitting — when a tablet exceeds 8MB, it splits at the 50% memory mark (not key count), producing two smaller tablets with updated ranges.
Precise tracking — every PUT/DELETE accounts for the exact memory delta, including string overhead.

Persistence

Two-tier durability:

Operation log (text) — every write is appended to a per-tablet log. Cleared on checkpoint.
Checkpoints (binary) — versioned, full tablet snapshots with size-prefixed serialization. Triggered every 5 writes.

Recovery replays the log from the last checkpoint to reconstruct pre-crash state.

What it powers

Webmail — compose, reply, delete, send to local and external addresses via SMTP.

File storage — upload/download any file type with chunked transfers and nested folders.

Admin console — live server health, tablet assignments, remote shutdown/restart.

Hard problems

Three-level locking without deadlocks — node mutex, tablet mutex, per-row mutex, always acquired in that order. Allows high concurrency on disjoint rows while protecting structural changes.
Memory-aware splitting — splitting by accumulated byte size rather than key count prevents pathological cases where a few large values dominate a tablet.
Routing during failover — the window between a node dying and the coordinator electing a new primary is the danger zone. The coordinator gates all routing decisions so frontends never get stale endpoints.
Synchronous replication over unreliable sockets — retries, connection tracking, and per-command ACK counters ensure writes either fully propagate or cleanly fail.

Built with

C++17, POSIX sockets, pthreads, Make.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PennCloud Backend

Architecture

Data model

Replication

Fault tolerance

Memory management

Persistence

What it powers

Hard problems

Built with

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PennCloud Backend

Architecture

Data model

Replication

Fault tolerance

Memory management

Persistence

What it powers

Hard problems

Built with

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages