OpenWorkers

The multi-agent system that refuses to make things up.

Every claim the system emits is either tied to a verifiable primary source or marked as unsupported. Two domains run in this codebase: a code auditor that verdicts factual claims in technical artefacts against the actual codebase, and a thesis assistant that audits literature claims against academic sources. Same DNA — planner → researcher → checker → critic, structured JSON everywhere, a hard trust gate enforced in code.

Quick start

git clone https://github.com/DavidHavoc/openworkers.git
cd openworkers
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Copy .env.example to .env, add at least one API key, and pick a provider:

DEEPSEEK_API_KEY=sk-...
THESIS_QUALITY_PROVIDER=deepseek
THESIS_QUALITY_MODEL=deepseek-chat
THESIS_BALANCED_PROVIDER=deepseek
THESIS_BALANCED_MODEL=deepseek-chat
THESIS_CHEAP_PROVIDER=deepseek
THESIS_CHEAP_MODEL=deepseek-chat
DRY_RUN=false

Code audit

Audit factual claims in READMEs and pull requests against the codebase. Every claim gets one of four verdicts:

Verdict	Meaning
`verified`	Code clearly demonstrates the claim is true
`drifted`	A related but divergent implementation exists (renamed flag, changed default, etc.)
`contradicted`	Code directly disproves the claim
`unsupported`	No evidence in the repo — enforced in Python, not delegated to the LLM

Audit a README

openworkers audit readme /path/to/any/repo

Example output (synthetic widgetlib):

{
  "claims": [
    {
      "text": "WidgetLib supports both synchronous and asynchronous widget creation.",
      "quote": "Create widgets via WidgetFactory synchronously or with WidgetFactoryAsync.",
      "claim_type": "feature",
      "verdict": "verified",
      "confidence": 0.92,
      "evidence_paths": ["widgetlib/factory.py:30-42"]
    },
    {
      "text": "The default widget timeout is 30 seconds.",
      "quote": "Time out after 30 seconds.",
      "claim_type": "feature",
      "verdict": "drifted",
      "confidence": 0.78,
      "evidence_paths": ["widgetlib/config.py:12"],
      "notes": "Default is 45 seconds in code; 30 was the v0.1 value."
    },
    {
      "text": "Built-in PostgreSQL support.",
      "quote": "WidgetLib includes native PostgreSQL support.",
      "claim_type": "feature",
      "verdict": "unsupported",
      "confidence": 0.0,
      "evidence_paths": [],
      "notes": "No supporting evidence found in the repository."
    }
  ],
  "critique": {
    "weak_verdicts": ["timeout claim — grep for '30' only; the 45-second constant uses a different syntax"],
    "missed_claims": ["README mentions 'rate limiting' but planner didn't extract it"],
    "suggestions": ["Re-run with expanded search hints for timeout-related constants"]
  }
}

The audited README is excluded from its own evidence pool — claims cannot verify themselves against the text that makes them.

Audit a pull request

openworkers audit pr https://github.com/owner/repo/pull/42

Uses GITHUB_TOKEN or GH_TOKEN for higher rate limits (anonymous works for public repos at 60 req/hour). The PR description is extracted by the planner; the unified diff is the evidence pool. Claims have PR-specific types: add, remove, fix, refactor, test, behavior, doc, other. See tests/fixtures/sample_pr/ for a canned example.

How the audit pipeline works

Both README and PR auditors share the same four-stage shape, parameterised by source adapter and prompts:

Planner (LLM)              extracts atomic claims from the artefact
    ↓
Researcher (Python)        deterministic grep over the codebase via SourceAdapter
    ↓
Checker (LLM + trust gate) judges each (claim, evidence) pair; trust gate overwrites
                           any verdict where evidence is empty — in code, not prompts
    ↓
Critic (LLM)               adversarial pass: weak verdicts, missed claims, suggestions

The trust gate (providers/code_audit_agents.py::_enforce_trust_gate) is the invariant. A confidently hallucinating checker that says verified for a claim with zero evidence gets corrected before the user ever sees the report. See AGENTS.md for the contributor recipe and ROADMAP.md for upcoming slices (compliance auditor, architecture auditor, layered source adapters).

Thesis assistant (legacy)

Audits literature claims against arXiv, Semantic Scholar, and CrossRef. Produces structured JSON — writing prose is explicitly out of scope. This pipeline is stable and maintained, but code audit is the new flagship.

thesis research "Can light replace electrons in CPUs?" --discipline computer_science
thesis critique "Social media causes depression because teens spend too much time online"
thesis verify "10.1038/nature14539"
thesis papers "transformer attention" --source arxiv --limit 5
thesis corpus thesis.pdf --title "My Thesis" --discipline cs --year 2024
thesis ingest add paper.pdf --collection my_papers   # RAG over your own PDFs
thesis sessions
thesis resume <session-id>

Every command accepts --format json and --output path/to/file.json. Output examples in docs/examples.md.

What the thesis assistant does

#	Capability	Description
1	Literature map	arXiv + Semantic Scholar; classified as supporting / challenging / adjacent
2	Citation audit	Flags missing, weak, contested citations across the lit set
3	Synthesis report	Methods, datasets, metrics; cross-paper comparisons
4	Structured critique	Strengths, weaknesses, gaps, counterarguments — JSON, never prose
5	Corpus benchmarks	Compare section length and citation density to reference corpus from your PDFs
6	Idea/draft critique	Standalone critique without running the full pipeline
7	Citation verification	DOI lookup via CrossRef; returns metadata or reports it does not exist
8	Quick paper search	arXiv / Semantic Scholar by keyword — no LLM, no token cost
9	Session persistence	Resume past sessions; list and filter by discipline/status (Redis or Postgres)
10	User RAG over PDFs	Ingest your own PDFs; researcher retrieves from them alongside arXiv/SS

LLM routing

UnifiedLLM routes to Anthropic, OpenAI, and DeepSeek across three tiers:

Tier	Used by	Suggested model
`quality`	HEAD planner, HEAD supervisor, critic	strongest
`balanced`	checker, synthesizer	mid
`cheap`	researcher	cheap / fast

Per-provider circuit breakers (pybreaker), tenacity retries with exponential jitter, and a hard budget guard (contextvars-scoped per-session ceiling) keep the system resilient. DRY_RUN=true runs the full pipeline without any API keys — useful for CI and wiring tests.

Architecture

flowchart TB
    User([CLI / MCP / FastAPI])

    subgraph Audit["Code Audit (flagship)"]
        direction LR
        AuditCLI["audit readme / audit pr"]
        AuditOrch[AuditOrchestrator]
        AuditCLI --> AuditOrch
    end

    subgraph Thesis["Thesis Assistant (legacy)"]
        direction LR
        ThesisCLI[thesis research / critique / verify]
        ThesisOrch[ThesisOrchestrator]
        ThesisCLI --> ThesisOrch
    end

    User --> Audit
    User --> Thesis

    subgraph Pipeline["Shared Pipeline"]
        Planner[Planner<br/>LLM] --> Researcher[Researcher<br/>Python]
        Researcher --> Checker[Checker<br/>LLM + Trust Gate]
        Checker --> Critic[Critic<br/>LLM]
    end

    subgraph Sources["Evidence Sources"]
        arXiv[arXiv]
        SS[Semantic Scholar]
        CrossRef[CrossRef]
        LocalRepo["LocalRepoAdapter<br/>grep over repo"]
        GitHub["GitHubAdapter<br/>grep over PR diff"]
    end

    AuditOrch --> Pipeline
    ThesisOrch --> Pipeline
    Researcher --> LocalRepo & GitHub & arXiv & SS
    Checker --> CrossRef & arXiv & LocalRepo

    subgraph Infra["Infrastructure"]
        Router["UnifiedLLM<br/>Anthropic / OpenAI / DeepSeek"]
        BB[(Blackboard<br/>Redis)]
        Mem[(Episodic Memory<br/>Qdrant)]
    end

    Pipeline --> Router
    ThesisOrch --> BB & Mem

Full detail in docs/architecture.md.

MCP server

Exposes four thesis tools over stdio for Claude Code, OpenCode, and any MCP-aware client:

Tool	Description
`thesis_research`	Run the full research pipeline
`thesis_critique`	Critique an idea or draft
`thesis_verify_citation`	Look up a DOI via CrossRef
`thesis_search_papers`	Quick arXiv / Semantic Scholar keyword search

Claude Code — add to ~/.claude/mcp.json or a project-level .mcp.json:

{
  "mcpServers": {
    "thesis-assistant": {
      "command": "/absolute/path/to/.venv/bin/python",
      "args": ["-m", "apps.mcp_server.main"]
    }
  }
}

OpenCode — add to ~/.config/opencode/opencode.json:

{
  "mcp": {
    "thesis-assistant": {
      "type": "local",
      "command": [
        "bash", "-lc",
        "cd /absolute/path/to/openworkers && docker compose run --rm -i mcp"
      ]
    }
  }
}

Replace /absolute/path/to/openworkers with your local checkout path.

Docker

docker compose build
docker compose up -d redis qdrant
docker compose run --rm cli python -m apps.cli.main research "your question"

cli and mcp services live behind the tools profile and start on demand. .env is mounted automatically.

FastAPI

Start the async HTTP interface:

uvicorn apps.api.main:app --reload

Per-IP rate limiting is enabled by default (60 req/min, 1000 req/hour). Configure via API_RATE_LIMIT_* env vars or set API_RATE_LIMIT_ENABLED=false to disable.

Interactive docs at http://localhost:8000/docs.

Method	Path	Description
`GET`	`/health`	Health check
`POST`	`/tasks/`	Submit a research task → `{task_id, status: "queued"}`
`GET`	`/tasks/`	List all tasks
`GET`	`/tasks/{task_id}`	Poll task status and result
`DELETE`	`/tasks/{task_id}`	Remove a completed or failed task

# Submit
curl -s -X POST http://localhost:8000/tasks/ \
  -H "Content-Type: application/json" \
  -d '{"query": "Does retrieval-augmented generation improve factuality?", "discipline": "computer_science"}'

# Poll
curl -s http://localhost:8000/tasks/abc-123 | jq '.status'

Contributing

pytest tests/ -v
ruff check . && black --check .
mypy core/ providers/ --strict --ignore-missing-imports

Read AGENTS.md before touching code. See CONTRIBUTING.md for the full workflow.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github		.github
apps		apps
core		core
docs		docs
examples/mcp		examples/mcp
prompts		prompts
providers		providers
scripts		scripts
tests		tests
tools		tools
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE-MIT		LICENSE-MIT
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenWorkers

Quick start

Code audit

Audit a README

Audit a pull request

How the audit pipeline works

Thesis assistant (legacy)

What the thesis assistant does

LLM routing

Architecture

MCP server

Docker

FastAPI

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenWorkers

Quick start

Code audit

Audit a README

Audit a pull request

How the audit pipeline works

Thesis assistant (legacy)

What the thesis assistant does

LLM routing

Architecture

MCP server

Docker

FastAPI

Contributing

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages