Synthetic Dialogue Generation, Orchestration, Evaluation, Interpretability.
Focus for agents:
- Reproducible environment & dependency handling
- Safe model / API usage & configuration overrides
- Standard commands (build, test, lint, docs, packaging)
- Dataset & artifacts locations
- Extension points (personas, orchestrators, evaluators, inspectors)
- Performance & caching knobs
- Contribution / PR hygiene
- Security / privacy considerations
If an instruction here conflicts with user chat input, defer to the user. For file‑local changes prefer editing minimal regions.
sdialog/ Core library (packaged via pyproject.toml)
requirements.txt Runtime + dev dependencies (dynamic in pyproject)
src/ or package root (Package modules live directly under sdialog/)
tutorials/ Example notebooks & advanced usage
docs/ Sphinx docs (ReadTheDocs)
AGENTS.md (This file)
README.md Human overview
CONTRIBUTING.md Contribution guidelines
LICENSE MIT
datasets/ | Datasets/ External dialogue / STAR dataset snapshots (read-only)
AutoTOD/, AgenTOD/, task-oriented-dialogue/ Related research / comparative tooling (not installed by default)
JSALT/ Workshop materials & experiments
Only the sdialog Python package is published to PyPI. Other folders are experimental/supporting and may have separate requirements.
Supported Python: >=3.9 (see pyproject.toml). Recommended fresh virtual environment.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .[dev] # if an extras block is later added; otherwise:
pip install -r sdialog/requirements.txt
pip install -e sdialog
pip uninstall -y sdialog || true
pip install -e sdialog
pyproject.tomlusessetuptools.dynamicto readrequirements.txt.- Pin additions: modify
sdialog/requirements.txtthen reference in PR. - Avoid silently upgrading core ML libs (
torch,transformers) without noting compatibility. - If adding optional backend (OpenAI / Ollama / AWS / Google), ensure minimal import‑time cost; guard imports.
- Do not auto‑install CUDA wheels; leave to user environment.
- If a test requires GPU, skip gracefully when
torch.cuda.is_available()is False.
Central API:
import sdialog
sdialog.config.llm("provider:model", temperature=0.7, top_p=0.9)
sdialog.config.llm_params(max_tokens=512)
Providers use prefix naming:
openai:MODELhuggingface:REPOollama:MODELamazon:bedrock-model-idgoogle:genai-model-id
When modifying code that instantiates models:
- Always allow explicit
model=override in class constructors. - Keep default fallback:
config["llm"]["model"]. - Avoid hard‑coding API keys; rely on environment variables (e.g.,
OPENAI_API_KEY,GOOGLE_API_KEY). Never commit keys.
Think segments: If think=True, internal prompts may contain <think>...</think> sections; pattern customizable via thinking_pattern.
Tools: Agents accept simple Python callables; treat them as pure (side‑effect‑light) unless clearly documented.
| Task | Command |
|---|---|
| Install dev deps | pip install -r sdialog/requirements.txt |
| Editable install | pip install -e sdialog |
| Lint (flake8) | flake8 sdialog |
| Run tests (pytest) | pytest -q |
| Coverage | pytest --cov=sdialog --cov-report=term-missing |
| Build docs | `pip install -r sdialog/docs/requirements.txt && (cd sdialog && make -C docs html |
| Package sdist/wheel | python -m build (if build backend tooling added) |
| Update version | Edit sdialog/util/__init__.py (where __version__ lives) |
| Format tables | Use sdialog.util.dict_to_table(..., markdown=True) |
Tests assume importable sdialog; ensure editable install before running.
STAR dataset utilities live under sdialog/datasets. External raw STAR data likely mirrored under Datasets/STAR or datasets/STAR.
Agent operations MUST NOT mutate dataset source files. For synthetic generation:
- Write outputs (dialogs JSON) into a new folder (e.g.,
outputs/orresults/). - Use
Dialog.to_file()for serialization; prefer.jsonextension.
Large artifacts (embeddings, cached evaluations) should be placed in a git‑ignored path (e.g., cache/, configurable via sdialog.config.set_cache(path, enable=True)).
Subclassing patterns:
- Personas: Inherit from
BaseAttributeModel(seesdialog.personas). Keep field names snake_case. - Orchestrators: Inherit
BaseOrchestratororBasePersistentOrchestrator; implementinstruct(self, dialog, utterance). - Generators: Extend
BaseAttributeModelGeneratorfor new structured generation flows. - Evaluators / Judges: Inherit
BaseDialogScore,BaseDialogFlowScore,BaseLLMJudge, or dataset evaluator bases. - Interpretability: Implement new
Steereror extendInspectorlogic carefully—avoid heavy work inside hook functions.
Composition syntax: agent = agent | orchestrator_or_inspector returns a cloned agent with the component attached.
When adding new components:
- Provide docstring with Example section.
- Ensure
.json()method returns serializable config. - Respect persona/context immutability unless explicitly cloning.
from sdialog.personas import Persona
from sdialog.agents import Agent
from sdialog import Context
user = Agent(persona=Persona(name="User", role="seeker"), first_utterance="Hi")
bot = Agent(persona=Persona(name="Bot", role="helper"))
ctx = Context(location="lab", topics=["safety"])
dialog = user.dialog_with(bot, context=ctx, max_turns=20)
dialog.print(orchestration=True)
dialog.to_file("example_dialog.json")
Common variations:
- Use
PersonaGenerator/ContextGeneratorfor automatic diversification. - Apply
Paraphraserto augment textual style while preserving semantics. - Prepend orchestrators for flow constraints (length, reflex, suggestion, change‑of‑mind).
Scoring guidelines:
- For batch evaluation prefer building a list of
Dialogobjects, then pass intoDatasetComparator. - When creating new metrics: implement
.score(dialog)returning primitive numeric or structured result; keep deterministic for identical input. - LLM judges should expose
reasontoggle to control cost.
Edge cases to handle in metrics:
- Empty dialogue (return None or 0 with documented behavior)
- Single turn (avoid division by zero in transition metrics)
- Non‑ASCII text (ensure
.lower()safe)
Inspector usage requires model objects exposing accessible layer names. When attaching hooks:
- Use precise target strings (e.g.,
model.layers.5.post_attention_layernorm). - Avoid capturing every layer unless necessary—memory blowup risk.
- Steering intervals
(start, end)reduce overhead. - Remove or clear inspectors after experiments to avoid stale references (
agent.clear_inspectors()).
Persisted analyses: Store extracted activations / summaries under a non‑tracked directory.
Enable caching explicitly:
import sdialog
sdialog.config.set_cache("./cache", enable=True)
- Clear cache between benchmark runs:
sdialog.config.clear_cache(). - Set seeds when generating: pass
seed=to generators / dialog methods. - Persona / context generators support rule templates; always log the chosen rules for reproducibility.
- Max line length: 120 (flake8 config)
- Prefer explicit imports (
from sdialog.personas import Persona) over wildcard. - Type hints: Add when public API surface is extended; avoid breaking backward compatibility.
- Keep pure data classes (pydantic) side‑effect free in
__init__. - Avoid heavy network I/O during module import—delay until method invocation.
- Use
pytestnaming:test_*.py. - Scope: Unit tests for persona cloning, orchestration logic, evaluation metrics; integration tests for end‑to‑end generation with a mock / lightweight model.
- Avoid live external API calls in default test suite—mock LLM responses or set environment variable gate (e.g.,
SDIALOG_RUN_LIVE=1). - For stochastic processes, pass a fixed
seedand assert structural properties instead of exact text. - Ensure new orchestrators have at least one deterministic trigger test.
Before opening PR:
- Run
flake8 sdialog. - Run
pytest -q(or subset if large). - Update / add tests for any new public method or class.
- Update docs / README / tutorials if API additions are user‑visible.
- If adding dependency: justify in PR description; prefer lightweight alternatives.
- If modifying persona or dialog schema fields: ensure backward compatibility, update serialization tests.
Commit message format (recommendation):
[sdialog] <short imperative summary>
Optional body explaining rationale, tradeoffs, migration notes.
Version bumps: maintainers update __version__; do not bundle unrelated refactors with release commits.
- Never log raw API keys or secrets.
- Avoid persisting full model outputs that may contain user-provided PII unless purposefully anonymized.
- When steering / hooking, do not serialize raw activation tensors in public artifacts; aggregate or hash if sharing.
- Validate file paths from user input to prevent directory traversal in any future file‑loading utilities.
- Minimize repeated tokenizer/model loads: reuse configured global model when possible.
- For large batch generation: lower
temperature, setmax_tokens, and disablethinkunless needed. - Use
Paraphraser(turn_by_turn=True)only for small dialogs; otherwise batch. - Downstream embedding evaluations: reuse a single
SentenceTransformerDialogEmbedderinstance.
| Symptom | Likely Cause | Resolution |
|---|---|---|
| Empty orchestrator output | Condition never true | Add logging / test condition with sample utterance |
| High memory usage | Too many inspector hooks | Narrow layer targets or detach inspector |
| Non‑deterministic tests | Missing seed or cached LLM randomness | Pass seed= and disable cache |
| Import error for backend | Optional dependency missing | Add to requirements.txt or guard import |
| Slow first call | Model warmup / remote latency | Document one warmup call in benchmark scripts |
from sdialog.orchestrators import BaseOrchestrator
class MyKeywordOrchestrator(BaseOrchestrator):
def __init__(self, keyword: str, instruction: str):
super().__init__()
self.keyword = keyword.lower()
self.instruction = instruction
def instruct(self, dialog, utterance: str):
if utterance and self.keyword in utterance.lower():
return self.instruction
return None
- Add test verifying instruction appears when keyword present.
- Document usage in docstring.
Heuristics you may apply:
- For generation tasks, search for
dialog_with(orPersonaGenerator(usage examples to scaffold new scripts. - For evaluation additions, locate subclasses of
BaseDialogScoreto pattern‑match. - Use
attributes()static method on persona classes to programmatically list fields.
- This project provides an
llm.txtfile at https://sdialog.readthedocs.io/en/latest/llm.txt following the llms.txt specification. - Agents can fetch this file for structured, curated information about the project with:
#fetch https://sdialog.readthedocs.io/en/latest/llm.txt - The llm.txt contains project summary, key documentation links, and API references optimized for LLM consumption.
- API docs auto-built from docstrings via Sphinx. Keep docstrings concise, with Example blocks.
- When adding config keys, update narrative docs (section: Configuration & Control) if present.
- If adding new tutorial notebook: place under
sdialog/tutorials/and keep outputs cleared.
When removing an experimental module:
- Mark as deprecated in previous release (docstring + warnings).
- Remove only after one minor version or if broken beyond trivial repair.
- Provide migration hint in changelog.
- Keep
CHANGELOG.md(if present) or GitHub Releases notes updated for: breaking changes, new components, deprecations. - Use semantic-ish grouping: Added / Changed / Fixed / Deprecated / Removed / Security.
- Prefer minimal diffs: isolate logical change sets.
- Run tests after modifying orchestration, evaluators, or persona schema.
- If uncertain about a field meaning, inspect docstring (source of truth) rather than guessing.
- Do not auto‑reformat entire files unless style violations present.
End of AGENTS.md