Skip to content

Latest commit

 

History

History
237 lines (156 loc) · 12.7 KB

File metadata and controls

237 lines (156 loc) · 12.7 KB

Colver Logo

PyPI Python Live Demo License

Colver

Lire en francais

Fast Belote Contree game environment for reinforcement learning. Rust core with Python bindings.

Live demo: colver.net — running on a Raspberry Pi.

Features

  • ~1.4M rollouts/sec single-threaded (play phase), ~895K rollouts/sec on a full deal
  • 56-byte Copy game state for fast MCTS cloning
  • Six AI agents — DMC Q-network, IS-DD with belief network, DD oracle, Smart/Naive IS-MCTS, and heuristic
  • NN bidding — "Bid V5 IS-DD", a score-aware Dueling DQN (113→512³→43) trained 25M steps on real IS-DD points, used by all agents
  • ML interpretability — XGBoost distillation + hidden-layer probe reveal the NN's implicit scoring system, translated into human-usable rules (88-94% agreement)
  • Belief network — NN-based card location prediction for IS-DD search
  • Web interface — play against AI, spectate, analyze, and solve problems (FastAPI + WebSocket)
  • Python bindings via PyO3 — Env class with full type stubs, installable from PyPI
  • Zero dependencies in the core (only rand behind a feature flag)

Web Interface

Play against AI agents directly in your browser at colver.net, or run it locally:

uv run python -m colver.web
# Or: uv run colver-web
# Open http://localhost:8000

Humain vs IA — Play as South against AI opponents. Choose the agent for your opponents (East/West) and your partner (North) independently. The game follows official FFB Belote Contree rules: bidding with coinche/surcoinche, then 8 tricks. Cards are played instantly on click; the pause slider controls AI thinking delay.

Play tab

IA vs IA — Spectate AI vs AI matches with all hands visible. Assign a different agent to each of the 4 seats. Step through actions, play full tricks, or use auto-play. The stats panel shows Q-values, DD scores, or hand evaluations for each decision. Paste a CFN string to load a specific position.

Watch tab

Rejouer — Browse and replay past games (played or spectated). Click an entry to step through it with navigation controls.

Annonces — Compose an 8-card hand, choose your position in the bidding round, and see what Bid V5 IS-DD (the NN bidder) would bid — with Q-values for every legal action, plus an XGBoost-distilled "key factors" panel (which features drove the decision) and a DouDou50 simulation of how often the contract actually succeeds.

Annonces tab

Annoncer — Visual strategy guide derived from the bot via ML (per-card point weights, decision rules per position, mirror rule for defense). 88-94% agreement with the NN, mémorisable en quelques minutes.

Aide — Aide-mémoire visuel : ordre de force et valeur des cartes (atout / non-atout), points de la donne, règles d'enchères.

Croyances — Visualize how the belief network and heuristic model predict card locations as a game progresses. Generate a random game, step through it, and see per-card probability bars with ground truth overlay and accuracy stats. Switch observer perspective (N/E/S/W) and compare NN vs heuristic predictions side by side.

Croyances tab

Problemes d'annonce — Bidding practice problems. See a hand and bidding history, then find the right bid. The AI evaluates your answer against the NN bidder's recommendation.

Problemes de jeu — Card play practice problems. See a mid-game position and find the best card. Compare your choice to the DD solver's optimal play.

Build & Run

Requires Rust 1.70+ and Python 3.10+.

# Tests (357 tests)
cargo test -p colver-core

# Performance benchmark
cargo run -p colver-core --bin bench --release

# MCTS vs random demo
cargo run -p colver-core --bin mcts_demo --release -- 100

# Smart IS-MCTS vs random + vs naive demo
cargo run -p colver-core --bin smart_ismcts_demo --release -- 100

# Python bindings (via uv)
uv sync
uv run python3 -c "import colver; env = colver.Env(); print(env.reset())"

# Web interface (play against AI)
uv run python -m colver.web

# DMC training (Q-network)
PYTHONPATH=scripts/training uv run python scripts/training/train_dmc.py --num-envs 256 --steps 20000000

# DMC evaluation vs IS-MCTS
uv run python scripts/analysis/eval_dmc.py models/dmc_final.pt --baseline smart --time-ms 20 --both-sides

AI Agents

Oracle — DD Solver (solver.rs)

Perfect-information double-dummy solver that sees all 4 hands — it cheats. Alpha-beta with transposition tables, PVS, killer moves, and card equivalence pruning. Computes the exact optimal card in ~7ms (median). Useful as an upper bound.

Dede — IS-DD (is_dd.rs)

Information Set Double-Dummy search. Maintains a probabilistic belief model over hidden cards — updated after every action via hard constraints (voids, trump ceiling) and soft inference (bidding signals, play patterns). Optionally augmented with a belief network (NN-based card location prediction, 330→512→512→128, ~2MB). Samples plausible opponent hands weighted by these beliefs, then solves each world exactly with the alpha-beta DD solver. IS-DD sounds like "is Dede" — hence the name.

DouDou50 — DMC Q-Network (dmc_net.rs)

DouZero-style reinforcement learning agent. A Q-network picks card plays with a single forward pass — no search tree. Default play model, trained 50M steps with the NN bidder frozen (triforge play-only phase).

Architecture: ResNet Dueling DQN 411→1024→1024→1024→32 with LayerNorm and skip connections (~2.6M parameters). Uses canonical suit encoding (no augmentation needed). Inference in pure Rust (~1ms/decision, no PyTorch needed). Strongest overall agent.

The previous model DouDou35 (415→1024³→32, legacy obs, 35M steps) is still supported for backward compatibility. DouDou = a reference to DouZero.

Older search agents

Smart IS-MCTS (smart_ismcts.rs) — Belief-weighted Information Set MCTS with heuristic card beliefs. Naive IS-MCTS (naive_ismcts.rs) — Ensemble determinization without beliefs. Both are configurable and documented in docs/SMART_ISMCTS.md.

Bid V5 IS-DD — NN Bidder (bid_net.rs)

Dueling DQN 113→512→512→512→43 with score-aware v2 observation (5 precomputed match-score features: my/opp normalized, win probability, leader distance, signed diff). Trained 25M steps on real IS-DD points (not DD oracle) with reward clipping, Polyak EMA (τ=0.005), and cosine LR decay. Best arena performance across DMC and IS-DD play (+11% DMC, +14.6% IS-DD winrate vs previous champion). BidNet::load auto-detects hidden size and obs_dim (108 / 110 / 113).

Past versions still supported via auto-detect:

  • Bid V3 Max — 108-dim, trained on max(DMC, IS-DD) real points (20M steps)
  • Bid à Dédé (v2) — 108-dim, DD oracle reward
  • Bid à Doudou (v1) — 114→256² dueling, DouZero self-play

Interpretability: XGBoost distillation and a hidden-layer linear probe revealed that the NN's implicit scoring differs sharply from the classical hand evaluation (e.g., J atout = +11 effective, 9 = +4, A atout = +1, side A = 0 net; plus an anti-synergy J×9 = −2). Translated into a mnemonic 5-feature decision tree reaching 88-94% NN agreement — see docs/bid/strategies/bid_v5_human_guide.md and docs/bid/interpretability/probe_morning_report.md.

Agent Comparison

Agent Type Speed/move Notes
Oracle (DD) DD solver (cheats) ~7ms Perfect info upper bound
Dede (IS-DD) DD solver + beliefs ~20ms Strongest search-based
DouDou50 Q-network (ResNet) <1ms Strongest overall, no search
Smart IS-MCTS Search + beliefs ~9ms Configurable budget
Naive IS-MCTS Search ~8ms Configurable budget

Note: Search-based agents get stronger with more time budget. The DMC agent uses no search — one forward pass per decision.

Architecture

Workspace: colver-core (pure Rust) + colver-py (PyO3/NumPy FFI) + colver-web (FastAPI/WebSocket)

Card Representation

Bitmask system: Card = u8 (0-31), CardSet = u32 (bitmask). Layout: Spades[0-7], Hearts[8-15], Diamonds[16-23], Clubs[24-31]. Within each suit: 7, 8, 9, J, Q, K, 10, A (plain strength order). Trump strength: J > 9 > A > 10 > K > Q > 8 > 7.

Game State

GameState is Copy and ≤96 bytes (compile-time enforced) for fast MCTS cloning. Contains hands, current trick, contract, points/tricks per team, bidding state, played cards bitmask, void tracking, and belote tracking.

Action Encoding

Phase Actions Encoding
Bidding 43 total 0=PASS, 1-36=bids (9 values x 4 suits), 37-40=capot x 4, 41=COINCHE, 42=SURCOINCHE
Playing 32 total Card index 0-31 directly

Game Flow

Bidding → Playing → Done. Bidding ends after 3 consecutive passes, a surcoinche, or 4 passes (void deal). Playing runs 8 tricks of 4 cards. Card point total = 152; with dix de der = 162 (normal) or 252 (capot).

Python API

import colver

print(colver.__version__)  # "0.3.3"

# Single environment
env = colver.Env()
obs, legal_actions = env.reset()
obs, reward, done, legal_actions = env.step(action)

env.current_player()       # 0-3
env.phase()                # 0=Bidding, 1=Playing, 2=Done
env.legal_action_mask()    # numpy array (43,)
env.rewards()              # [NS_score, EW_score]
env.bid_improved()         # improved_bid action
env.deal_outcome()         # [NS_outcome, EW_outcome] binary
env.get_observation()      # 415-float observation vector
env.action_naive_ismcts(20)  # naive IS-MCTS action (20ms)
env.action_smart_ismcts(20)  # smart IS-MCTS action (20ms)

# DMC Q-network (if model weights downloaded)
model = colver.model_path()  # ~/.cache/colver/models/dmc_final.bin
if model:
    env.load_dmc_model(str(model))
    result = env.action_dmc_with_stats()  # {"best_action": 5, "q_values": [...]}

Performance

Workload Throughput Latency
Play-phase rollout 1.4M/sec ~720 ns
Full-deal rollout 895K/sec ~1118 ns
MCTS game (1000 iter) vs random 8 ms
Smart IS-MCTS game (20x50) vs random 9 ms
DMC Q-Network inference <1 ms

Docker

The Docker image lets you deploy the web interface on any machine, including a Raspberry Pi (ARM64).

# Build and run
docker build -t colver .
docker run -p 8000:8000 colver

# Or with Docker Compose
docker compose up -d

# Cross-build for Raspberry Pi (ARM64)
docker buildx build --platform linux/arm64 -t colver .

The image is ~257 MB (no PyTorch dependency). All agents run in pure Rust and work on all architectures.

Rules

Implements Belote Contree with 4 suits (Spades, Hearts, Diamonds, Clubs). Scoring mode: "points faits + points demandes". See REGLES-DE-LA-BELOTE-CONTREE.pdf for the full FFB rulebook.

References

Acknowledgments

Thanks to Ronan Guillou, seasoned coinche player, for his advice on the game and for being the first tester — his good sense guided many UI decisions.