pond

Your own small-scale data lake.

A unified storage and retrieval layer for sessions produced by any agentic client (Claude Code, Codex, OpenCode, Cursor, aider, ChatGPT, Gemini CLI, ...). One Rust binary, two deployments: a personal pond on your laptop, or a multi-tenant backend for hosted agent infrastructure. Lance file format on object storage. No SQL.

This repository is currently design-only. Implementation has not started.

Status

Pre-implementation. The repository contains:

docs/design.md - the locked-in v1 design (sections 1-4 are the source of truth; section 5 is empty).
docs/references/ - frozen snapshots of the upstream schemas pond's design draws from, plus real session samples from eight source harnesses (docs/references/session-samples/).
docs/archive/ - historical design notes and the resolved open-questions log.

Implementation begins next.

Background

Every agentic CLI ships its own session format and its own search surface. Switching tools means losing history. Replaying a Claude Code session in another provider's tooling means re-translating the wire shape by hand. Hosted multi-tenant deployments rebuild the same storage layer from scratch.

Pond is one Rust binary that ingests sessions from any source, stores them losslessly in a canonical Part union (modeled on Effect v4's Prompt-side types), and serves them via HTTP+JSON or MCP. Storage, hybrid search (BM25 + vector + RRF), and provider-agnostic replay all sit on a single Lance-on-object-storage foundation.

Two day-1 use cases:

Personal: replace a per-tool knowledge base. Ingest local Claude Code sessions, hybrid-search them, retrieve them for replay.
Hosted: storage and search backend for multi-tenant agent deployments. Each namespace is an opaque-string isolation boundary; the integrator owns identity, access, and routing.

See docs/design.md for the full rationale.

Design

The design doc lives at docs/design.md. Sections 1-4 are the source of truth.

Key choices:

Rust + tokio, single static binary.
lance-format/lance crates direct as the only storage and search engine. No lancedb wrapper, no SQL, no additional database.
object_store (via Lance) for storage substrate: S3 / GCS / Azure / local filesystem.
Canonical session types owned in pond, in the shape of Effect v4's Prompt-side Part union. This is the moat. Response-side metadata is projected into per-Message Lance columns, not stored as Parts.
Four Lance datasets: sessions, messages, parts, embeddings. Hot filter columns are denormalized onto search rows for single-stage filter pushdown (messages and embeddings carry source_agent / project / role / timestamp for prefilter on hybrid search).
One adapter trait, SourceAdapter, with a deterministic event-ordering contract. Everything else (storage, indexing, OCC, time-travel, namespaces, manifest versioning, blob storage) is Lance direct - no extra "seam" abstractions.
Append-only writes. Replay (cross-provider re-projection) is deferred to section 4.
v1 surface: two transports - HTTP+JSON (POST /v1/<op> plus SSE) and MCP (rmcp), wrapping the same handlers. Operations: pond_search, pond_get, pond_ingest, pond_session_events. CLI verbs out of band: pond ingest, pond serve, pond status, pond embed-worker, pond maintenance.
Default embeddings: Qwen3-Embedding-0.6B via fastembed-rs (local, Matryoshka 32-1024, 32K context, Apache 2.0). Embedding registry is config-driven.
Multi-tenancy via opaque namespace strings; bucket prefix per namespace; separate buckets when KMS isolation matters.
Encryption is operational (bucket SSE + filesystem encryption), not application-level.

References

docs/references/ holds frozen snapshots of upstream schemas and real session samples. Each subdirectory's README pins the source URL, the upstream commit, and the snapshot date.

Path	Source	Why kept
`docs/references/effect/`	github.com/Effect-TS/effect	Effect v4 Prompt/Response Part unions. Pond's canonical types copy this shape.
`docs/references/opencode/`	github.com/sst/opencode	Effect Schema canonical Part union; SDK types; storage schema.
`docs/references/kilocode/`	github.com/kilo-org/kilocode	OpenCode fork. Adds `editorContext`, plan-followup, kilocode-specific events.
`docs/references/pi-mono/`	github.com/badlogic/pi-mono	Leaf-cursor branching and cross-provider conformance test matrix.
`docs/references/lancedb/`	github.com/lancedb + github.com/lance-format	Capability snapshot and evolution timeline for Lance + LanceDB.
`docs/references/otel-genai-semconv.md`	github.com/open-telemetry/semantic-conventions-genai	GenAI semantic conventions. Inspiration for shape overlap; pond does not derive from OTel.
`docs/references/anthropic-managed-agents.pdf`	Anthropic	Session-as-event-log framing for managed agents.
`docs/references/session-samples/`	local captures	Real session captures for eight source harnesses (claude-code, claude-app, claude-managed-agents, codex, opencode, openclaw, nanoclaw, pi). Drives adapter design and stress-tests the schema.

To refresh a snapshot, see the maintenance instructions in docs/references/README.md.

Contributing

Issues and pull requests are welcome. Because the project is pre-implementation, the most useful contributions right now are:

Design feedback on docs/design.md.
Pointers to additional reference schemas or session samples worth snapshotting under docs/references/.
Corrections to the design doc.

For larger changes, please open an issue first to discuss the direction.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.claude/skills/rust-dev		.claude/skills/rust-dev
docs		docs
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
skills-lock.json		skills-lock.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pond

Table of Contents

Status

Background

Design

References

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pond

Table of Contents

Status

Background

Design

References

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages