Skip to content

AnshGajera/Galcogens-OpenEnv

Repository files navigation

EmailTriage OpenEnv — Hackathon Submission

A production-grade OpenEnv environment that simulates real-world email triage — the daily task of processing, prioritizing, and responding to a mixed work inbox. Built for the OpenEnv Hackathon with 3 difficulty-graded tasks, continuous partial rewards, and dynamic mid-episode events.

Why Email Triage?

Email triage is a task professionals perform daily: scanning an inbox, deciding what to archive, what needs a reply, coordinating calendar availability, and handling urgent escalations. This makes it an ideal testbed for evaluating agent decision-making, prioritization, and multi-step planning under changing conditions.

Tasks

The environment defines 3 benchmark tasks with increasing difficulty:

Task ID Name Emails Max Steps Dynamic Events Description
easy Quick Sort 3 6 Archive 3 spam/newsletter emails. Tests basic categorization.
medium Priority Triage 5 10 Triage 5 mixed-priority emails with calendar scheduling. Tests reading, drafting, and archiving decisions.
hard Dynamic Crisis 7–10 12 Handle a full inbox with mid-episode urgent emails and calendar changes. Tests adaptation and escalation handling.

In addition, the root submission manifest defines 3 deterministic validator tasks used for task/grader compliance checks:

Task ID Module Grader
email_classification tasks.email_classification:solve graders.email_classification_grader:grade
priority_detection tasks.priority_detection:solve graders.priority_detection_grader:grade
response_generation tasks.response_generation:solve graders.response_generation_grader:grade

All grader scores are normalized to the range 0.0-1.0.

Action Space

The agent sends an EmailtriageAction with these fields:

Field Type Description
action_type "read" | "archive" | "query_calendar" | "draft_email" The tool/action to execute
target_email_id int Email ID to act on (-1 for query_calendar)
draft_content str Reply text for draft_email actions
proposed_slot str Calendar slot for scheduling drafts

Observation Space

After each step the agent receives an EmailtriageObservation:

Field Type Description
inbox_preview List[Dict] Metadata for up to 5 unread emails (id, sender, subject, priority, status)
returned_emails List[str] Full email text from read actions
calendar_slots List[str] Available calendar slots
last_action_result str Grader feedback for the most recent action
inbox_remaining int Count of unread emails
conversation_history List[str] Recent action/feedback trace
reward float Step reward in [0, 1]
done bool Whether the episode has ended

Reward Function

Rewards are continuous and partially informative (not binary pass/fail):

  • Archive spam/newsletters: 0.62–0.80 per correct archive
  • Read emails: 0.09–0.25 depending on priority (higher for critical emails)
  • Query calendar: 0.10–0.46 based on pending scheduling workload
  • Draft replies: Multi-factor scoring based on:
    • Task appropriateness (is this email worth drafting?)
    • Draft quality (length, professionalism, keyword relevance)
    • Calendar awareness (did you check availability first?)
    • Valid proposed slot
    • Urgency handling for escalations
  • Progress bonus: +0.12 for each email successfully processed
  • Completion bonus: +0.10 when all inbox items are triaged
  • Penalties: Archiving important emails scores 0.03–0.08 (not zero)

Setup Instructions

Prerequisites

  • Python 3.10+
  • Docker (for containerized deployment)
  • openenv-core and uv installed

Install Dependencies

# Root-level (for inference script)
pip install -r requirements.txt

# Environment (using uv)
cd EmailTriage
uv sync

Run Locally

# Start the environment server
cd EmailTriage
uvicorn server.app:app --host 0.0.0.0 --port 8000 --reload

Run Inference

# Set required environment variables
export API_BASE_URL="https://router.huggingface.co/v1"
export MODEL_NAME="Qwen/Qwen2.5-72B-Instruct"
export HF_TOKEN="your-hf-token"
export LOCAL_IMAGE_NAME="emailtriage-env:latest"

# Run all 3 tasks
python inference.py

Docker Build

docker build -t emailtriage-env:latest .

Validate

# Environment validation (inner package)
cd EmailTriage
openenv validate

# Return to root and run deterministic task/grader sanity check
cd ..
python -c "from tasks import list_tasks; print([t['id'] for t in list_tasks()])"

Deploy to Hugging Face Spaces

cd EmailTriage
openenv push --repo-id OMCHOKSI108/Emailopenenvrl

Project Structure

Galcogens-OpenEnv/
├── inference.py              # Hackathon inference script (runs 3 tasks)
├── openenv.yaml              # Root submission manifest (entrypoint/endpoints/tasks/graders)
├── Dockerfile                # Root container definition
├── requirements.txt          # Inference-only dependencies
├── tasks/                    # Deterministic validator task definitions
├── graders/                  # Deterministic validator graders (score in [0.0, 1.0])
├── README.md                 # This file
└── EmailTriage/
    ├── __init__.py            # Package exports
    ├── client.py              # EnvClient implementation
    ├── models.py              # Pydantic Action/Observation/State models
    ├── openenv.yaml           # Inner OpenEnv manifest
    ├── pyproject.toml         # Package configuration
    ├── README.md              # HF Space README
    └── server/
        ├── app.py             # FastAPI server
        ├── EmailTriage_environment.py  # Core environment + 3 task graders
        └── Dockerfile         # Server container definition

Hackathon Checklist

  • Real-world task simulation (email triage)
  • Full OpenEnv spec: typed models, step()/reset()/state(), openenv.yaml
  • 3 benchmark tasks (easy → medium → hard) with continuous grading
  • 3 deterministic submission validator tasks with matching graders (scores 0.0-1.0)
  • Meaningful reward function with partial progress signals
  • Baseline inference script with reproducible scores
  • Dockerfile builds
  • README with environment description, action/observation spaces, setup instructions

Baseline Scores

Use this command to reproduce baseline scores:

python inference.py

Environment used for reproducible runs:

  • API_BASE_URL=https://router.huggingface.co/v1
  • MODEL_NAME=Qwen/Qwen2.5-72B-Instruct
  • HF_TOKEN=<your-token>
  • ENV_BASE_URL=http://localhost:8000 (or your deployed Space URL)

Recorded scores from a successful local containerized run (emailtriage-env:local-check):

Task Score Notes
easy 0.72 6 steps
medium 0.60 10 steps
hard 0.62 12 steps

Aggregate baseline (mean across tasks): 0.65

The inference logger prints scores in [0.00, 1.00] and emits strict [START], [STEP], and [END] stdout lines for evaluator parsing.

Environment Variables

Variable Required Default Description
API_BASE_URL No https://router.huggingface.co/v1 LLM API endpoint
MODEL_NAME No Qwen/Qwen2.5-72B-Instruct Model identifier
HF_TOKEN Yes Hugging Face API key
LOCAL_IMAGE_NAME No emailtriage-env:latest Docker image name

About

a reinforcement learning agent built with OpenEnv and Stable-Baselines3 that learns to intelligently manage email workflows. The agent handles tasks ranging from spam filtering to drafting meeting invitations and resolving ambiguous client requests.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors