a0-langextract

Structured extraction for Agent Zero — invoices, recipes, prep lists, anything.

Hand it a messy invoice scan. Get back clean JSON with every field traced to its source.

Why This Exists

Restaurant kitchens run on paper — invoices from Sysco, scribbled prep lists, photocopied recipes. Getting that data into a system usually means someone typing it in by hand.

a0-langextract gives your Agent Zero a structured extraction engine. Point it at a PDF, raw text, or URL — get back typed JSON with source grounding. Every extracted field maps back to the exact characters in the original document. No hallucinated data.

Built for CarabinerOS by an exec chef who got tired of manually entering invoices.

Install

One-liner (inside A0 container)

cd /a0
git clone https://github.com/notabotchef/a0-langextract usr/plugins/langextract
cd usr/plugins/langextract && bash install.sh

The install script auto-detects A0's Python venv (/opt/venv-a0/) and installs langextract there.

Manual

git clone https://github.com/notabotchef/a0-langextract usr/plugins/langextract
/opt/venv-a0/bin/pip install langextract
# Restart A0 to pick up the new plugin

After install, enable LangExtract in the A0 Plugin Hub and restart A0.

Configuration

Settings UI

Open the LangExtract panel in A0's Plugins sidebar to configure:

API Key — Gemini or OpenAI API key (required unless using Ollama)
Model ID — extraction model (gemini-2.5-flash, gpt-4o, etc.)
Extraction Passes — higher = better recall, more cost (1-5)
Max Workers — parallel workers for chunked documents
Chunk Size — character buffer for splitting large documents
Save Visualization — generate interactive HTML alongside JSON
Output Directory — where results are saved

API Key Setup

LangExtract needs its own API key to call the extraction model. This is separate from Agent Zero's chat model.

Provider	How to set
Gemini	Settings UI, or `LANGEXTRACT_API_KEY` / `GOOGLE_API_KEY` env var
OpenAI	Settings UI, or `OPENAI_API_KEY` env var
Ollama	No key needed (set model_id to an ollama model)

Priority: Settings UI > LANGEXTRACT_API_KEY env var > GOOGLE_API_KEY env var.

Config File

Settings are also in default_config.yaml:

api_key: ""               # API key for the extraction model
model_id: ""              # Leave empty for gemini-2.5-flash default
extraction_passes: 1      # 1-5
max_workers: 5            # Parallel workers
max_char_buffer: 2000     # Chunk size in characters
fence_output: null        # Auto-detected for OpenAI models
save_visualization: true  # Generate HTML visualization
output_dir: "extractions" # Relative to A0 work dir

Usage

Input Types

The text argument accepts:

Input	Example
File path	`/a0/usr/uploads/invoice.pdf`
URL	`https://example.com/recipe.txt`
Raw text	`SYSCO FOODS\nInvoice #: ...`

PDFs are read using PyMuPDFLoader (same as A0's document_query). No need to pre-process documents — pass the file path directly.

Tools

Tool	Purpose
`langextract:extract`	General extraction with custom prompt + examples
`langextract:extract_invoice`	Invoices, delivery tickets
`langextract:extract_recipe`	Recipes with ingredients, steps, temps
`langextract:extract_prep`	Prep lists, production sheets
`langextract:schemas`	List available schemas

Example Prompts

Using langextract, parse /a0/usr/uploads/invoice.pdf

Using langextract:extract_recipe, parse /a0/usr/uploads/recipes.pdf

Extract all menu items with prices from this text: ...

Output

Every extraction produces:

JSON — structured data in extractions/ directory
HTML — interactive visualization with source highlighting (optional)
Agent response — formatted summary in chat

{
  "extractions": [
    {
      "class": "line_item",
      "text": "Choice Striploin 180D",
      "attributes": {
        "quantity": "8",
        "unit": "LB",
        "unit_price": "14.75",
        "total": "118.00"
      },
      "char_interval": [56, 77]
    }
  ]
}

The char_interval maps every extraction to exact character positions in the source. No hallucinations — full traceability.

Built-in Schemas

Schema	Classes Extracted	Use Case
`invoice`	`vendor`, `invoice_meta`, `line_item`, `invoice_total`	Sysco/US Foods invoices, delivery tickets
`recipe`	`recipe_meta`, `ingredient`, `step`	Recipes with ingredients, steps, techniques
`prep_list`	`prep_header`, `station`, `prep_item`	Kitchen prep lists, production sheets

Custom Schemas

Pass your own prompt + examples for any extraction task:

{
  "tool_name": "langextract:extract",
  "tool_args": {
    "text": "/a0/usr/uploads/menu.pdf",
    "prompt": "Extract all menu items with prices and descriptions",
    "examples": [
      {
        "text": "Wagyu Tartare  24\nhand-cut, smoked yolk, caper berries",
        "extractions": [
          {
            "extraction_class": "menu_item",
            "extraction_text": "Wagyu Tartare",
            "attributes": {
              "price": "24",
              "description": "hand-cut, smoked yolk, caper berries"
            }
          }
        ]
      }
    ]
  }
}

Project Structure

a0-langextract/
├── plugin.yaml                 # A0 plugin manifest
├── default_config.yaml         # Default configuration
├── install.sh                  # Installer (targets A0 venv)
├── dev-sync.sh                 # Dev: rsync to A0 plugin dir
├── Makefile                    # setup / test / clean
├── helpers/
│   ├── extractor.py            # Core extraction engine
│   └── schemas.py              # Built-in schema registry
├── tools/
│   └── langextract.py          # A0 tool interface
├── prompts/
│   ├── agent.system.tool.langextract.md   # Tool registration prompt
│   ├── fw.langextract.extract_ok.md       # Success template
│   └── fw.langextract.extract_error.md    # Error template
├── api/
│   └── config_api.py           # Settings API endpoint
├── webui/
│   ├── main.html               # Settings panel UI
│   └── langextract-store.js    # Alpine.js store
├── tests/
│   ├── test_extractor.py       # Unit tests
│   └── fixtures/               # Sample documents
└── docs/
    └── examples.md             # Detailed usage guide

Development

# Clone
git clone https://github.com/notabotchef/a0-langextract
cd a0-langextract

# Sync to local A0 for testing
bash dev-sync.sh

# Run tests
make test

Tests

make test
# or
python -m pytest tests/ -v

Tests validate schema structure and output formatting without making LLM calls.

Contributing

Fork it
Create your feature branch (git checkout -b feat/my-schema)
Add tests for new schemas
Run make test
Open a PR

Schema contributions welcome — wine lists, HACCP logs, equipment maintenance, or any food service document.

License

MIT — see LICENSE.

Built by Esteban Nunez for CarabinerOS.

Powered by google/langextract — structured extraction with source grounding.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

a0-langextract

Why This Exists

Install

One-liner (inside A0 container)

Manual

Configuration

Settings UI

API Key Setup

Config File

Usage

Input Types

Tools

Example Prompts

Output

Built-in Schemas

Custom Schemas

Project Structure

Development

Tests

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
api		api
docs		docs
extensions/prompts		extensions/prompts
helpers		helpers
prompts		prompts
tests		tests
tools		tools
webui		webui
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
default_config.yaml		default_config.yaml
dev-sync.sh		dev-sync.sh
install.sh		install.sh
plugin.yaml		plugin.yaml

Folders and files

Latest commit

History

Repository files navigation

a0-langextract

Why This Exists

Install

One-liner (inside A0 container)

Manual

Configuration

Settings UI

API Key Setup

Config File

Usage

Input Types

Tools

Example Prompts

Output

Built-in Schemas

Custom Schemas

Project Structure

Development

Tests

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages