Structured extraction for Agent Zero — invoices, recipes, prep lists, anything.
Hand it a messy invoice scan. Get back clean JSON with every field traced to its source.
Restaurant kitchens run on paper — invoices from Sysco, scribbled prep lists, photocopied recipes. Getting that data into a system usually means someone typing it in by hand.
a0-langextract gives your Agent Zero a structured extraction engine. Point it at a PDF, raw text, or URL — get back typed JSON with source grounding. Every extracted field maps back to the exact characters in the original document. No hallucinated data.
Built for CarabinerOS by an exec chef who got tired of manually entering invoices.
cd /a0
git clone https://github.com/notabotchef/a0-langextract usr/plugins/langextract
cd usr/plugins/langextract && bash install.shThe install script auto-detects A0's Python venv (/opt/venv-a0/) and installs langextract there.
git clone https://github.com/notabotchef/a0-langextract usr/plugins/langextract
/opt/venv-a0/bin/pip install langextract
# Restart A0 to pick up the new pluginAfter install, enable LangExtract in the A0 Plugin Hub and restart A0.
Open the LangExtract panel in A0's Plugins sidebar to configure:
- API Key — Gemini or OpenAI API key (required unless using Ollama)
- Model ID — extraction model (
gemini-2.5-flash,gpt-4o, etc.) - Extraction Passes — higher = better recall, more cost (1-5)
- Max Workers — parallel workers for chunked documents
- Chunk Size — character buffer for splitting large documents
- Save Visualization — generate interactive HTML alongside JSON
- Output Directory — where results are saved
LangExtract needs its own API key to call the extraction model. This is separate from Agent Zero's chat model.
| Provider | How to set |
|---|---|
| Gemini | Settings UI, or LANGEXTRACT_API_KEY / GOOGLE_API_KEY env var |
| OpenAI | Settings UI, or OPENAI_API_KEY env var |
| Ollama | No key needed (set model_id to an ollama model) |
Priority: Settings UI > LANGEXTRACT_API_KEY env var > GOOGLE_API_KEY env var.
Settings are also in default_config.yaml:
api_key: "" # API key for the extraction model
model_id: "" # Leave empty for gemini-2.5-flash default
extraction_passes: 1 # 1-5
max_workers: 5 # Parallel workers
max_char_buffer: 2000 # Chunk size in characters
fence_output: null # Auto-detected for OpenAI models
save_visualization: true # Generate HTML visualization
output_dir: "extractions" # Relative to A0 work dirThe text argument accepts:
| Input | Example |
|---|---|
| File path | /a0/usr/uploads/invoice.pdf |
| URL | https://example.com/recipe.txt |
| Raw text | SYSCO FOODS\nInvoice #: ... |
PDFs are read using PyMuPDFLoader (same as A0's document_query). No need to pre-process documents — pass the file path directly.
| Tool | Purpose |
|---|---|
langextract:extract |
General extraction with custom prompt + examples |
langextract:extract_invoice |
Invoices, delivery tickets |
langextract:extract_recipe |
Recipes with ingredients, steps, temps |
langextract:extract_prep |
Prep lists, production sheets |
langextract:schemas |
List available schemas |
Using langextract, parse /a0/usr/uploads/invoice.pdf
Using langextract:extract_recipe, parse /a0/usr/uploads/recipes.pdf
Extract all menu items with prices from this text: ...
Every extraction produces:
- JSON — structured data in
extractions/directory - HTML — interactive visualization with source highlighting (optional)
- Agent response — formatted summary in chat
{
"extractions": [
{
"class": "line_item",
"text": "Choice Striploin 180D",
"attributes": {
"quantity": "8",
"unit": "LB",
"unit_price": "14.75",
"total": "118.00"
},
"char_interval": [56, 77]
}
]
}The char_interval maps every extraction to exact character positions in the source. No hallucinations — full traceability.
| Schema | Classes Extracted | Use Case |
|---|---|---|
invoice |
vendor, invoice_meta, line_item, invoice_total |
Sysco/US Foods invoices, delivery tickets |
recipe |
recipe_meta, ingredient, step |
Recipes with ingredients, steps, techniques |
prep_list |
prep_header, station, prep_item |
Kitchen prep lists, production sheets |
Pass your own prompt + examples for any extraction task:
{
"tool_name": "langextract:extract",
"tool_args": {
"text": "/a0/usr/uploads/menu.pdf",
"prompt": "Extract all menu items with prices and descriptions",
"examples": [
{
"text": "Wagyu Tartare 24\nhand-cut, smoked yolk, caper berries",
"extractions": [
{
"extraction_class": "menu_item",
"extraction_text": "Wagyu Tartare",
"attributes": {
"price": "24",
"description": "hand-cut, smoked yolk, caper berries"
}
}
]
}
]
}
}a0-langextract/
├── plugin.yaml # A0 plugin manifest
├── default_config.yaml # Default configuration
├── install.sh # Installer (targets A0 venv)
├── dev-sync.sh # Dev: rsync to A0 plugin dir
├── Makefile # setup / test / clean
├── helpers/
│ ├── extractor.py # Core extraction engine
│ └── schemas.py # Built-in schema registry
├── tools/
│ └── langextract.py # A0 tool interface
├── prompts/
│ ├── agent.system.tool.langextract.md # Tool registration prompt
│ ├── fw.langextract.extract_ok.md # Success template
│ └── fw.langextract.extract_error.md # Error template
├── api/
│ └── config_api.py # Settings API endpoint
├── webui/
│ ├── main.html # Settings panel UI
│ └── langextract-store.js # Alpine.js store
├── tests/
│ ├── test_extractor.py # Unit tests
│ └── fixtures/ # Sample documents
└── docs/
└── examples.md # Detailed usage guide
# Clone
git clone https://github.com/notabotchef/a0-langextract
cd a0-langextract
# Sync to local A0 for testing
bash dev-sync.sh
# Run tests
make testmake test
# or
python -m pytest tests/ -vTests validate schema structure and output formatting without making LLM calls.
- Fork it
- Create your feature branch (
git checkout -b feat/my-schema) - Add tests for new schemas
- Run
make test - Open a PR
Schema contributions welcome — wine lists, HACCP logs, equipment maintenance, or any food service document.
MIT — see LICENSE.
Built by Esteban Nunez for CarabinerOS.
Powered by google/langextract — structured extraction with source grounding.