AGENTS Guidelines for This Repository

Synthetic Dialogue Generation, Orchestration, Evaluation, Interpretability.

Focus for agents:

Reproducible environment & dependency handling
Safe model / API usage & configuration overrides
Standard commands (build, test, lint, docs, packaging)
Dataset & artifacts locations
Extension points (personas, orchestrators, evaluators, inspectors)
Performance & caching knobs
Contribution / PR hygiene
Security / privacy considerations

If an instruction here conflicts with user chat input, defer to the user. For file‑local changes prefer editing minimal regions.

1. Repository Layout (key paths only)

sdialog/                Core library (packaged via pyproject.toml)
  requirements.txt      Runtime + dev dependencies (dynamic in pyproject)
  src/ or package root  (Package modules live directly under sdialog/)
  tutorials/            Example notebooks & advanced usage
  docs/                 Sphinx docs (ReadTheDocs)
  AGENTS.md             (This file)
  README.md             Human overview
  CONTRIBUTING.md       Contribution guidelines
  LICENSE               MIT
datasets/ | Datasets/   External dialogue / STAR dataset snapshots (read-only)
AutoTOD/, AgenTOD/, task-oriented-dialogue/  Related research / comparative tooling (not installed by default)
JSALT/                  Workshop materials & experiments

Only the sdialog Python package is published to PyPI. Other folders are experimental/supporting and may have separate requirements.

2. Environment Setup

Supported Python: >=3.9 (see pyproject.toml). Recommended fresh virtual environment.

Quick install (library usage)

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .[dev]  # if an extras block is later added; otherwise:
pip install -r sdialog/requirements.txt
pip install -e sdialog

Clean reinstall

pip uninstall -y sdialog || true
pip install -e sdialog

Dependency resolution notes

pyproject.toml uses setuptools.dynamic to read requirements.txt.
Pin additions: modify sdialog/requirements.txt then reference in PR.
Avoid silently upgrading core ML libs (torch, transformers) without noting compatibility.
If adding optional backend (OpenAI / Ollama / AWS / Google), ensure minimal import‑time cost; guard imports.

GPU / Torch

Do not auto‑install CUDA wheels; leave to user environment.
If a test requires GPU, skip gracefully when torch.cuda.is_available() is False.

3. LLM Configuration & Global State

Central API:

import sdialog
sdialog.config.llm("provider:model", temperature=0.7, top_p=0.9)
sdialog.config.llm_params(max_tokens=512)

Providers use prefix naming:

openai:MODEL
huggingface:REPO
ollama:MODEL
amazon:bedrock-model-id
google:genai-model-id

When modifying code that instantiates models:

Always allow explicit model= override in class constructors.
Keep default fallback: config["llm"]["model"].
Avoid hard‑coding API keys; rely on environment variables (e.g., OPENAI_API_KEY, GOOGLE_API_KEY). Never commit keys.

Think segments: If think=True, internal prompts may contain <think>...</think> sections; pattern customizable via thinking_pattern.

Tools: Agents accept simple Python callables; treat them as pure (side‑effect‑light) unless clearly documented.

4. Core Command Cheat‑Sheet

Task	Command
Install dev deps	`pip install -r sdialog/requirements.txt`
Editable install	`pip install -e sdialog`
Lint (flake8)	`flake8 sdialog`
Run tests (pytest)	`pytest -q`
Coverage	`pytest --cov=sdialog --cov-report=term-missing`
Build docs	`pip install -r sdialog/docs/requirements.txt && (cd sdialog && make -C docs html
Package sdist/wheel	`python -m build` (if build backend tooling added)
Update version	Edit `sdialog/util/__init__.py` (where `__version__` lives)
Format tables	Use `sdialog.util.dict_to_table(..., markdown=True)`

Tests assume importable sdialog; ensure editable install before running.

5. Data & Artifacts

STAR dataset utilities live under sdialog/datasets. External raw STAR data likely mirrored under Datasets/STAR or datasets/STAR.

Agent operations MUST NOT mutate dataset source files. For synthetic generation:

Write outputs (dialogs JSON) into a new folder (e.g., outputs/ or results/).
Use Dialog.to_file() for serialization; prefer .json extension.

Large artifacts (embeddings, cached evaluations) should be placed in a git‑ignored path (e.g., cache/, configurable via sdialog.config.set_cache(path, enable=True)).

6. Personas, Agents, Orchestrators (Extension Points)

Subclassing patterns:

Personas: Inherit from BaseAttributeModel (see sdialog.personas). Keep field names snake_case.
Orchestrators: Inherit BaseOrchestrator or BasePersistentOrchestrator; implement instruct(self, dialog, utterance).
Generators: Extend BaseAttributeModelGenerator for new structured generation flows.
Evaluators / Judges: Inherit BaseDialogScore, BaseDialogFlowScore, BaseLLMJudge, or dataset evaluator bases.
Interpretability: Implement new Steerer or extend Inspector logic carefully—avoid heavy work inside hook functions.

Composition syntax: agent = agent | orchestrator_or_inspector returns a cloned agent with the component attached.

When adding new components:

Provide docstring with Example section.
Ensure .json() method returns serializable config.
Respect persona/context immutability unless explicitly cloning.

7. Dialogue Generation Workflow (Minimal)

from sdialog.personas import Persona
from sdialog.agents import Agent
from sdialog import Context

user = Agent(persona=Persona(name="User", role="seeker"), first_utterance="Hi")
bot  = Agent(persona=Persona(name="Bot", role="helper"))
ctx  = Context(location="lab", topics=["safety"])
dialog = user.dialog_with(bot, context=ctx, max_turns=20)
dialog.print(orchestration=True)
dialog.to_file("example_dialog.json")

Common variations:

Use PersonaGenerator / ContextGenerator for automatic diversification.
Apply Paraphraser to augment textual style while preserving semantics.
Prepend orchestrators for flow constraints (length, reflex, suggestion, change‑of‑mind).

8. Evaluation & Comparison

Scoring guidelines:

For batch evaluation prefer building a list of Dialog objects, then pass into DatasetComparator.
When creating new metrics: implement .score(dialog) returning primitive numeric or structured result; keep deterministic for identical input.
LLM judges should expose reason toggle to control cost.

Edge cases to handle in metrics:

Empty dialogue (return None or 0 with documented behavior)
Single turn (avoid division by zero in transition metrics)
Non‑ASCII text (ensure .lower() safe)

9. Interpretability & Steering

Inspector usage requires model objects exposing accessible layer names. When attaching hooks:

Use precise target strings (e.g., model.layers.5.post_attention_layernorm).
Avoid capturing every layer unless necessary—memory blowup risk.
Steering intervals (start, end) reduce overhead.
Remove or clear inspectors after experiments to avoid stale references (agent.clear_inspectors()).

Persisted analyses: Store extracted activations / summaries under a non‑tracked directory.

10. Caching & Reproducibility

Enable caching explicitly:

import sdialog
sdialog.config.set_cache("./cache", enable=True)

Clear cache between benchmark runs: sdialog.config.clear_cache().
Set seeds when generating: pass seed= to generators / dialog methods.
Persona / context generators support rule templates; always log the chosen rules for reproducibility.

11. Code Style & Linting

Max line length: 120 (flake8 config)
Prefer explicit imports (from sdialog.personas import Persona) over wildcard.
Type hints: Add when public API surface is extended; avoid breaking backward compatibility.
Keep pure data classes (pydantic) side‑effect free in __init__.
Avoid heavy network I/O during module import—delay until method invocation.

12. Testing Guidelines

Use pytest naming: test_*.py.
Scope: Unit tests for persona cloning, orchestration logic, evaluation metrics; integration tests for end‑to‑end generation with a mock / lightweight model.
Avoid live external API calls in default test suite—mock LLM responses or set environment variable gate (e.g., SDIALOG_RUN_LIVE=1).
For stochastic processes, pass a fixed seed and assert structural properties instead of exact text.
Ensure new orchestrators have at least one deterministic trigger test.

13. PR / Commit Instructions

Before opening PR:

Run flake8 sdialog.
Run pytest -q (or subset if large).
Update / add tests for any new public method or class.
Update docs / README / tutorials if API additions are user‑visible.
If adding dependency: justify in PR description; prefer lightweight alternatives.
If modifying persona or dialog schema fields: ensure backward compatibility, update serialization tests.

Commit message format (recommendation):

[sdialog] <short imperative summary>

Optional body explaining rationale, tradeoffs, migration notes.

Version bumps: maintainers update __version__; do not bundle unrelated refactors with release commits.

14. Security / Privacy Considerations

Never log raw API keys or secrets.
Avoid persisting full model outputs that may contain user-provided PII unless purposefully anonymized.
When steering / hooking, do not serialize raw activation tensors in public artifacts; aggregate or hash if sharing.
Validate file paths from user input to prevent directory traversal in any future file‑loading utilities.

15. Performance Tips

Minimize repeated tokenizer/model loads: reuse configured global model when possible.
For large batch generation: lower temperature, set max_tokens, and disable think unless needed.
Use Paraphraser(turn_by_turn=True) only for small dialogs; otherwise batch.
Downstream embedding evaluations: reuse a single SentenceTransformerDialogEmbedder instance.

16. Common Pitfalls & Resolutions

Symptom	Likely Cause	Resolution
Empty orchestrator output	Condition never true	Add logging / test condition with sample utterance
High memory usage	Too many inspector hooks	Narrow layer targets or detach inspector
Non‑deterministic tests	Missing seed or cached LLM randomness	Pass `seed=` and disable cache
Import error for backend	Optional dependency missing	Add to `requirements.txt` or guard import
Slow first call	Model warmup / remote latency	Document one warmup call in benchmark scripts

17. Adding a New Orchestrator (Template)

from sdialog.orchestrators import BaseOrchestrator

class MyKeywordOrchestrator(BaseOrchestrator):
	def __init__(self, keyword: str, instruction: str):
		super().__init__()
		self.keyword = keyword.lower()
		self.instruction = instruction

	def instruct(self, dialog, utterance: str):
		if utterance and self.keyword in utterance.lower():
			return self.instruction
		return None

Add test verifying instruction appears when keyword present.
Document usage in docstring.

18. Automated Parsing Hints (for coding agents)

Heuristics you may apply:

For generation tasks, search for dialog_with( or PersonaGenerator( usage examples to scaffold new scripts.
For evaluation additions, locate subclasses of BaseDialogScore to pattern‑match.
Use attributes() static method on persona classes to programmatically list fields.

LLM-friendly documentation

This project provides an llm.txt file at https://sdialog.readthedocs.io/en/latest/llm.txt following the llms.txt specification.
Agents can fetch this file for structured, curated information about the project with: #fetch https://sdialog.readthedocs.io/en/latest/llm.txt
The llm.txt contains project summary, key documentation links, and API references optimized for LLM consumption.

19. Updating Documentation

API docs auto-built from docstrings via Sphinx. Keep docstrings concise, with Example blocks.
When adding config keys, update narrative docs (section: Configuration & Control) if present.
If adding new tutorial notebook: place under sdialog/tutorials/ and keep outputs cleared.

20. De‑scoping / Removal Policy

When removing an experimental module:

Mark as deprecated in previous release (docstring + warnings).
Remove only after one minor version or if broken beyond trivial repair.
Provide migration hint in changelog.

21. Changelog Discipline

Keep CHANGELOG.md (if present) or GitHub Releases notes updated for: breaking changes, new components, deprecations.
Use semantic-ish grouping: Added / Changed / Fixed / Deprecated / Removed / Security.

22. Final Notes for Agents

Prefer minimal diffs: isolate logical change sets.
Run tests after modifying orchestration, evaluators, or persona schema.
If uncertain about a field meaning, inspect docstring (source of truth) rather than guessing.
Do not auto‑reformat entire files unless style violations present.

End of AGENTS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS Guidelines for This Repository

1. Repository Layout (key paths only)

2. Environment Setup

Quick install (library usage)

Clean reinstall

Dependency resolution notes

GPU / Torch

3. LLM Configuration & Global State

4. Core Command Cheat‑Sheet

5. Data & Artifacts

6. Personas, Agents, Orchestrators (Extension Points)

7. Dialogue Generation Workflow (Minimal)

8. Evaluation & Comparison

9. Interpretability & Steering

10. Caching & Reproducibility

11. Code Style & Linting

12. Testing Guidelines

13. PR / Commit Instructions

14. Security / Privacy Considerations

15. Performance Tips

16. Common Pitfalls & Resolutions

17. Adding a New Orchestrator (Template)

18. Automated Parsing Hints (for coding agents)

LLM-friendly documentation

19. Updating Documentation

20. De‑scoping / Removal Policy

21. Changelog Discipline

22. Final Notes for Agents

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS Guidelines for This Repository

1. Repository Layout (key paths only)

2. Environment Setup

Quick install (library usage)

Clean reinstall

Dependency resolution notes

GPU / Torch

3. LLM Configuration & Global State

4. Core Command Cheat‑Sheet

5. Data & Artifacts

6. Personas, Agents, Orchestrators (Extension Points)

7. Dialogue Generation Workflow (Minimal)

8. Evaluation & Comparison

9. Interpretability & Steering

10. Caching & Reproducibility

11. Code Style & Linting

12. Testing Guidelines

13. PR / Commit Instructions

14. Security / Privacy Considerations

15. Performance Tips

16. Common Pitfalls & Resolutions

17. Adding a New Orchestrator (Template)

18. Automated Parsing Hints (for coding agents)

LLM-friendly documentation

19. Updating Documentation

20. De‑scoping / Removal Policy

21. Changelog Discipline

22. Final Notes for Agents