Skip to content

Commit 4aeffc8

Browse files
committed
merge: resolve origin/main conflicts and apply audit fixes
Merges aa817e5 (one-click local Tangle harness) + prior commits from origin/main. The new `cargo tangle harness` is complementary to `cargo tangle dev`: `dev` is a zero-config workspace bootstrapper, `harness` is the multi-blueprint + router orchestrator. Both coexist. Audit fixes addressed in this merge (from the critical review of #1376): BLOCKERS: - Replace `std::mem::forget(Child)` with proper session detachment via `CommandExt::pre_exec(setsid)`. Child is now drop()'d cleanly (Unix: drop ≠ kill) while surviving SIGHUP on terminal close. - Replace `workspace_is_dev()` heuristic with positive marker `# managed-by = "cargo-tangle-dev"` written into the `.tangle.toml` header. User-authored files are never deleted, and `dev up --force` refuses to overwrite unmarked files. - Atomic-write uses unique sibling name `<fname>.tmp.<pid>.<nonce>` (the old `with_extension("tmp")` collapsed `.tangle.toml` to `.tangle.tmp`, shadowing a real file) + sweeps stale tmp siblings on write. - `acquire_dev_lock` via `nix::fcntl::Flock` (non-blocking exclusive flock at `.tangle/dev/.lock`) serialises concurrent `dev up` calls in the same directory. MAJOR: - 3s timeout on `ensure_anvil_on_path` defending against hanging PATH shims / FUSE binaries. - `SNAPSHOT_MIN_BLOCK` named const replaces magic `>= 200`. - Dead `--anvil-logs` flag removed; logging is now unconditional to `.tangle/dev/anvil.log`. - `TangleClientArgs::resolve()` memoised via `OnceCell` — one `.tangle.toml` read per command instance instead of 4+. - `TangleWorkspace::discover()` canonicalises CWD so walks resolve through symlinks into the real project tree. - Hetzner removed from GPU candidate list: the adapter uses the Cloud API (no GPU SKUs); Hetzner's GPU-matrix dedicated servers are Robot-API only. Hetzner stays in the CPU list. MINOR: - Drop dead `DevnetStack::Drop` empty impl. - Error messages no longer imply env-var workflow (we're workspace-first). - Add workspace tests for malformed TOML, $TANGLE_CONFIG override, header round-trip, and stale-tmp cleanup (7 total, all passing). End-to-end validated: `dev up` → `jobs submit` (5² = 25) → `dev down` in a fresh directory, < 8s total.
2 parents 862dbc1 + aa817e5 commit 4aeffc8

53 files changed

Lines changed: 5531 additions & 203 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Reflect: GPU Providers + Inference Core Session
2+
Date: 2026-04-06
3+
4+
## Run Grade: 8/10
5+
6+
| Dimension | Score | Evidence |
7+
|---|---|---|
8+
| **Goal achievement** | 9/10 | All stated goals met: 17 providers, inference core extraction, 7 blueprint migrations, vllm→llm rename, audit fixes, PR merged. Only gap: 5 blueprints at 7-22% LOC reduction vs 50% target. |
9+
| **Code quality** | 8/10 | 33 core tests, 250 provider tests, clippy clean, feature-gated. Audit found and fixed CRITICAL nonce TOCTOU, credential exposure, integer overflow. Remaining: no live API integration tests. |
10+
| **Efficiency** | 7/10 | ~40 background agents dispatched. Many hit rate limits and delivered partial work. Shared file edits were lost 2-3 times during the session (cargo fmt overwrites, stash accidents). Significant rework re-applying wiring. |
11+
| **Self-correction** | 9/10 | Caught and corrected: "backends should be separate blueprints" → "backends are operator config within service-type blueprints". Caught RFP-GPU-SUPPORT.md was outdated. Caught that manager GPU flow was scaffolded but not wired. Critical audit findings fixed same-session. |
12+
| **Learning** | 8/10 | Produced 3 pursuit specs, MIGRATION.md, architecture docs. But .evolve state was inconsistent — current.json was rewritten multiple times without stable baselines. |
13+
| **Overall** | 8/10 | Massive scope delivered. PR merged. Production-ready path established. Efficiency drag from file-loss incidents and rate-limited agents. |
14+
15+
## Session Flow Analysis
16+
17+
### Flow 1: Audit → Discover gap → Fix
18+
```
19+
TRIGGER: User asks "check this, is it complete?"
20+
STEPS: Read existing code → find gap → propose fix → user refines intent → implement
21+
OUTCOME: RFP rewritten, PLAN updated, GPU flow documented
22+
Frequency: 3x (RFP, PLAN, GPU-SUPPORT.md)
23+
Automation: None needed — this is discovery work
24+
```
25+
26+
### Flow 2: Parallel agent dispatch → compile → fix stragglers
27+
```
28+
TRIGGER: Large implementation scope ("do all of it")
29+
STEPS: Write shared skeleton → dispatch 5-7 background agents → wait → fix compile errors from agent output → verify
30+
OUTCOME: Code lands but with inconsistencies (some agents use retry, others don't; some handle 404, others don't)
31+
Frequency: 5x (Gen 1 adapters, Gen 2 decentralized, Gen 2 hardening, Gen 3 migrations, audit fixes)
32+
Automation potential: HIGH — a "dispatch + verify + fix" meta-skill would prevent the inconsistency problem
33+
```
34+
35+
### Flow 3: Shared file collision
36+
```
37+
TRIGGER: Multiple agents or cargo fmt modifying the same tracked file
38+
STEPS: Agent A writes to config.rs → cargo fmt runs → Agent B writes to config.rs → one version wins, other lost
39+
OUTCOME: Wiring had to be re-applied 3 times (providers/mod.rs, pricing-engine enum, config.rs)
40+
Frequency: 3x
41+
LESSON: Pre-populate ALL shared file edits before dispatching agents. Agents should ONLY create new files in their own directories.
42+
```
43+
44+
### Flow 4: Architecture pivot
45+
```
46+
TRIGGER: User corrects a wrong assumption
47+
STEPS: I propose X → user says "that's not the intent, it's Y" → I adjust
48+
OUTCOME: Better architecture (blueprints = services, backends = operator choice, manager = infrastructure)
49+
Frequency: 3x (backend-as-blueprint → backend-as-config, manager should handle provisioning, Vllm types are correctly named)
50+
LESSON: Ask clarifying questions before proposing architecture changes. The user's existing design was more thoughtful than I initially assumed.
51+
```
52+
53+
## Project Health
54+
55+
### blueprint SDK (~/code/blueprint)
56+
- **Trajectory**: Improving — 17 providers merged, hardened, documented
57+
- **Test coverage**: 250 provider tests (JSON parsing + retry helpers). Zero e2e API tests. ~60% meaningful coverage.
58+
- **Architecture**: Clean — adapter pattern scales well. Config sprawl improved with helpers. Enum exhaustiveness is the right tradeoff.
59+
- **Next action**: Live integration test with one real provider (RunPod or Lambda Labs — cheapest to validate)
60+
61+
### tangle-inference-core (~/code/tangle-inference-core)
62+
- **Trajectory**: Healthy — 33 tests, feature-gated, all 7 consumers adopted
63+
- **Test coverage**: Good for billing/cost math. Weak for server helpers (validate_spend_auth tests added but no HTTP-level integration test).
64+
- **Architecture**: Clean — AppStateBuilder + type-erased backend is the right call. Feature gates prevent unnecessary dep weight.
65+
- **Next action**: CI pipeline (GitHub Actions) — currently no automated testing on push
66+
67+
### Inference blueprints (7 repos)
68+
- **Trajectory**: Converging — all on tangle-inference-core, all compile clean
69+
- **Test coverage**: Varies wildly (llm: 5+26 tests, distributed: 11, modal: 0 lib tests)
70+
- **Architecture**: 2 fully migrated (llm, voice at 50%), 5 partially migrated (7-22%)
71+
- **Next action**: Deep server.rs rewrites for the 5 partial migrations
72+
73+
## Key Learnings
74+
75+
### 1. Shared file edits are the #1 source of rework
76+
Every time multiple agents or tools modify the same file, one version wins. The session lost ~2 hours re-applying CloudProvider enum additions, config structs, and factory registrations. **Rule: do all shared-file edits yourself, dispatch agents only for new-file creation.**
77+
78+
### 2. Agent consistency requires explicit reference implementations
79+
The 5 "correct" adapters (akash, io_net, prime_intellect, render, bittensor_lium) all used retry_with_backoff because their prompts were written later and pointed to the earlier adapters as references. The 6 "incorrect" adapters (lambda_labs, runpod, vast_ai, paperspace, fluidstack, tensordock) were written first without a reference and missed retry. **Rule: always include a reference implementation in agent prompts.**
80+
81+
### 3. The user's design intent is load-bearing
82+
Three times I proposed architectural changes (backends-as-blueprints, manager-level backend selection, vllm rename being "not worth it") that the user corrected. Each correction revealed that the existing design had more thought behind it than I assumed. **Rule: audit before proposing. Ask why before suggesting what.**
83+
84+
### 4. Audit-driven development produces the highest-quality improvements
85+
The /critical-audit skill found the nonce TOCTOU race (CRITICAL), credential exposure (HIGH), integer overflow (MEDIUM), and settle_billing fire-and-forget (HIGH) — none of which were visible during normal development. The audit→fix cycle produced the most impactful quality improvements of the entire session. **Rule: audit after every major build phase, not just at the end.**
86+
87+
### 5. Git dep vs path dep matters for production
88+
An agent changed a path dep to a git dep mid-session, which was actually the right call — local paths break for anyone else cloning the repo. But it surfaced that the remote needed to be pushed first. **Rule: always use git deps for cross-repo references. Push before depending.**
89+
90+
## Product Signals
91+
92+
### 1. GPU Cloud Marketplace Abstraction Layer
93+
**Who would pay**: Operators who want to serve GPU workloads without vendor lock-in.
94+
**Evidence**: 17 providers integrated, all following the same trait. The value is the abstraction, not any single provider.
95+
**Signal strength**: Strong — this is the core value proposition of the Blueprint Manager's remote-providers system.
96+
97+
### 2. Shared Inference Operator Infrastructure
98+
**Who would pay**: Blueprint developers building inference services.
99+
**Evidence**: 7 blueprints adopted tangle-inference-core, each deleting 500-1800 LOC of duplicated billing/metrics/auth code.
100+
**Signal strength**: Strong — the duplication was real and growing.
101+
102+
### 3. Settlement Recovery Queue
103+
**Who would pay**: Operators who can't afford to serve free inference when on-chain settlement fails.
104+
**Evidence**: settle_billing was silently dropping errors. The recovery queue ensures failed settlements are retried.
105+
**Signal strength**: Medium — important for production but not a standalone product.
106+
107+
## Action Items (ordered by impact)
108+
109+
1. **CI pipeline for tangle-inference-core** — no automated testing on push. One regression breaks all 7 blueprints.
110+
2. **Live integration test** — prove one provider's REST payloads actually work (RunPod: cheapest, simplest API).
111+
3. **Deep server.rs rewrites** for embedding/modal/image-gen/video-gen/distributed — use core's `billing_gate` and `from_config` to reach 50% reduction.
112+
4. **Memory: save the shared-file-edit rule** — this session's biggest efficiency loss. Future sessions should avoid it.
113+
5. **Ops board tasks** for each remaining item so nothing falls through cracks.
Lines changed: 128 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,128 @@
1+
---
2+
type: rfp
3+
status: review
4+
date: 2026-04-06
5+
author: claude
6+
---
7+
8+
# RFP: Avatar Inference Blueprint
9+
10+
## Context
11+
12+
6 inference blueprints exist: LLM, voice, image, video-gen, embedding, distributed. None produce talking-head/avatar content. The video-gen blueprint does generative video (Hunyuan/LTX via ComfyUI), which is text/image-to-cinematic-video, not face animation.
13+
14+
An avatar blueprint would complete the "full UGC pipeline on Tangle" story:
15+
```
16+
LLM (script) → Voice (narration) → Image (scenes) → Avatar (talking head) → Video-Gen (composite)
17+
```
18+
19+
## What an Avatar Blueprint Does
20+
21+
Accepts: audio file + face image (or avatar preset)
22+
Returns: video of the face speaking the audio with lip-sync
23+
24+
This is what HeyGen, D-ID, Hedra do. The blueprint wraps this capability for Tangle operators.
25+
26+
## Architecture Options
27+
28+
### Option A: API Proxy Blueprint (ship fast)
29+
30+
Operator wraps a commercial API (HeyGen, D-ID, or Hedra).
31+
32+
```
33+
Client → Tangle Router → Avatar Blueprint Operator → HeyGen API → video returned
34+
35+
Operator pays HeyGen upstream
36+
Client pays operator via x402
37+
```
38+
39+
**Pros:** Ships in days. Best-in-class quality (HeyGen Avatar IV). Same pattern as existing blueprints.
40+
**Cons:** Operator needs their own HeyGen API key. Margin depends on HeyGen pricing. Not truly decentralized.
41+
42+
### Option B: Self-Hosted Open Source (ship slower, true decentralization)
43+
44+
Operator runs open-source lip-sync models on GPU.
45+
46+
SOTA open-source options (April 2026):
47+
- **ByteDance OmniHuman-1** — best open-source avatar quality, any reference image, but ~48GB VRAM
48+
- **SadTalker** — mature, lower VRAM (~8GB), audio-to-face animation
49+
- **Wav2Lip** — oldest, most stable, lowest quality
50+
- **MuseTalk** — real-time lip-sync, ~12GB VRAM
51+
- **LivePortrait** — expression transfer, Hugging Face spaces available
52+
53+
Could run via ComfyUI (same backend as video-gen-blueprint) with custom nodes for lip-sync.
54+
55+
**Pros:** No API dependency. Truly decentralized. Operators keep full margin.
56+
**Cons:** Quality gap vs HeyGen. Higher VRAM requirements. More complex operator setup.
57+
58+
### Option C: Dual Mode (recommended, matches video-gen-blueprint pattern)
59+
60+
The video-gen-blueprint already supports both ComfyUI (self-hosted) and API (Modal/Replicate). Same pattern:
61+
62+
```rust
63+
enum AvatarBackend {
64+
ComfyUI { workflow: String }, // Self-hosted: SadTalker/MuseTalk via ComfyUI
65+
HeyGen { api_key: String }, // Commercial proxy
66+
DID { api_key: String }, // Commercial proxy
67+
Replicate { api_token: String }, // Hosted open-source models
68+
}
69+
```
70+
71+
Operator chooses their backend. Clients don't care which — they get the same API.
72+
73+
## Endpoints
74+
75+
Following the pattern of existing inference blueprints:
76+
77+
```
78+
POST /v1/avatar/generate
79+
{
80+
"audio_url": "https://...", // narration audio
81+
"image_url": "https://...", // face image (or avatar_id for presets)
82+
"avatar_id": "preset-1", // optional: use a preset avatar
83+
"duration_seconds": 30,
84+
"output_format": "mp4",
85+
"resolution": "1080p"
86+
}
87+
88+
→ 202 Accepted (async job)
89+
{
90+
"job_id": "...",
91+
"status": "processing",
92+
"poll_url": "/v1/avatar/jobs/{job_id}"
93+
}
94+
95+
GET /v1/avatar/jobs/{job_id}
96+
→ { "status": "completed", "video_url": "https://...", "duration": 28.5 }
97+
```
98+
99+
## Contract (VideoGenBSM pattern)
100+
101+
Same pattern as `video-gen-inference-blueprint/contracts/`:
102+
- VRAM validation for self-hosted operators
103+
- Per-second pricing
104+
- Duration limits
105+
- Result hash verification
106+
107+
## Relationship to Router
108+
109+
The Router's model list would include avatar models:
110+
```
111+
heygen/avatar-iv (via operator running HeyGen proxy)
112+
sadtalker/v2 (via operator running ComfyUI self-hosted)
113+
musetalk/realtime (via operator running self-hosted)
114+
```
115+
116+
Clients call `POST router.tangle.tools/v1/avatar/generate`, Router selects best operator.
117+
118+
## Effort Estimate
119+
120+
- Option A (API proxy only): ~2-3 days (copy video-gen-blueprint pattern, swap endpoints)
121+
- Option B (self-hosted only): ~1-2 weeks (ComfyUI workflow authoring, VRAM testing)
122+
- Option C (dual mode): ~1 week for proxy, +1 week for self-hosted
123+
124+
## Decision Needed
125+
126+
1. Start with Option A (fastest) or Option C (most complete)?
127+
2. Which commercial API to target first: HeyGen (best quality) or D-ID (cheapest, most API-native)?
128+
3. Priority relative to other blueprint work?
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Live Integration Tests
2+
3+
on:
4+
workflow_dispatch:
5+
inputs:
6+
providers:
7+
description: 'Providers to test (comma-separated: lambda_labs,runpod,tensordock)'
8+
default: 'lambda_labs'
9+
10+
jobs:
11+
live-test:
12+
runs-on: ubuntu-latest
13+
timeout-minutes: 10
14+
steps:
15+
- uses: actions/checkout@v4
16+
- uses: dtolnay/rust-toolchain@stable
17+
- uses: Swatinem/rust-cache@v2
18+
with:
19+
workspaces: crates/blueprint-remote-providers
20+
- name: Run live tests
21+
env:
22+
LAMBDA_LABS_API_KEY: ${{ secrets.LAMBDA_LABS_API_KEY }}
23+
RUNPOD_API_KEY: ${{ secrets.RUNPOD_API_KEY }}
24+
TENSORDOCK_API_KEY: ${{ secrets.TENSORDOCK_API_KEY }}
25+
TENSORDOCK_API_TOKEN: ${{ secrets.TENSORDOCK_API_TOKEN }}
26+
run: |
27+
cargo test -p blueprint-remote-providers --test live_integration -- --ignored --nocapture

0 commit comments

Comments
 (0)