Build an agent that monitors, evaluates, and rewrites its own tools at runtime.
This repository contains the full working code from the blog post: The Self-Extending AI Agent Part 2: Build a Self-Improving Agent That Rewrites Its Own Tools
This is the continuation of Part 1: The Self-Extending Agent (GitHub repo).
In Part 1, the agent learned to generate new tools when it encounters a capability gap. In Part 2, the agent goes further — it monitors tool performance, detects degradation, rewrites underperforming tools using an LLM, and validates rewrites against a regression test suite before promoting them.
The improvement loop:
- Execute — Every tool call is wrapped with metrics capture (latency, success/failure, output).
- Record — Metrics are stored in a SQLite-backed Performance Memory.
- Evaluate — A weighted scoring function computes tool quality and flags underperformers.
- Rewrite — An LLM receives the old code + failure context and produces an improved version.
- Test — A Regression Runner validates the candidate against stored test cases.
- Promote — Only if all tests pass, the new version replaces the old one.
| File | Description |
|---|---|
main.py |
Entry point — seeds test cases and runs the agent |
orchestrator.py |
Extended orchestrator with metrics capture and improvement loop |
performance_memory.py |
SQLite-backed storage for per-tool invocation metrics |
evaluator.py |
Weighted quality scoring with configurable thresholds |
rewriter.py |
LLM-powered tool rewriter with performance context |
regression_runner.py |
Test suite runner with type checking and latency gates |
registry.py |
Tool registry with persistence (from Part 1) |
generator.py |
LLM tool generator (from Part 1) |
validator.py |
AST + sandbox validator (from Part 1) |
python -m venv venv
source venv/bin/activate # Linux/macOS
venv\Scripts\activate # Windowspip install -r requirements.txtexport OPENAI_API_KEY="your-key-here"Or edit main.py directly (not recommended for production).
python main.pyThe agent will:
- Generate tools it needs (if not already in the registry).
- Execute tasks while recording performance metrics.
- Automatically evaluate and rewrite tools that score below the threshold (0.6).
- Print a final performance report showing tool versions and metrics.
- Python 3.11+
- OpenAI API key (GPT-4o recommended)
Read the full walkthrough with architecture diagrams, live demo output, and production deployment guidance:
- Part 2 (this repo): The Self-Improving Agent
- Part 1: The Self-Extending Agent