Skip to content

OneManCrew/self-improving-agent

Repository files navigation

The Self-Improving AI Agent (Part 2)

Build an agent that monitors, evaluates, and rewrites its own tools at runtime.

This repository contains the full working code from the blog post: The Self-Extending AI Agent Part 2: Build a Self-Improving Agent That Rewrites Its Own Tools

This is the continuation of Part 1: The Self-Extending Agent (GitHub repo).

What This Does

In Part 1, the agent learned to generate new tools when it encounters a capability gap. In Part 2, the agent goes further — it monitors tool performance, detects degradation, rewrites underperforming tools using an LLM, and validates rewrites against a regression test suite before promoting them.

The improvement loop:

  1. Execute — Every tool call is wrapped with metrics capture (latency, success/failure, output).
  2. Record — Metrics are stored in a SQLite-backed Performance Memory.
  3. Evaluate — A weighted scoring function computes tool quality and flags underperformers.
  4. Rewrite — An LLM receives the old code + failure context and produces an improved version.
  5. Test — A Regression Runner validates the candidate against stored test cases.
  6. Promote — Only if all tests pass, the new version replaces the old one.

Files

File Description
main.py Entry point — seeds test cases and runs the agent
orchestrator.py Extended orchestrator with metrics capture and improvement loop
performance_memory.py SQLite-backed storage for per-tool invocation metrics
evaluator.py Weighted quality scoring with configurable thresholds
rewriter.py LLM-powered tool rewriter with performance context
regression_runner.py Test suite runner with type checking and latency gates
registry.py Tool registry with persistence (from Part 1)
generator.py LLM tool generator (from Part 1)
validator.py AST + sandbox validator (from Part 1)

Setup

1. Create a virtual environment

python -m venv venv
source venv/bin/activate  # Linux/macOS
venv\Scripts\activate     # Windows

2. Install dependencies

pip install -r requirements.txt

3. Set your OpenAI API key

export OPENAI_API_KEY="your-key-here"

Or edit main.py directly (not recommended for production).

Usage

python main.py

The agent will:

  1. Generate tools it needs (if not already in the registry).
  2. Execute tasks while recording performance metrics.
  3. Automatically evaluate and rewrite tools that score below the threshold (0.6).
  4. Print a final performance report showing tool versions and metrics.

Requirements

  • Python 3.11+
  • OpenAI API key (GPT-4o recommended)

Blog Post

Read the full walkthrough with architecture diagrams, live demo output, and production deployment guidance:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages