Skip to content

Commit bdbc991

Browse files
neuron7xLabclaude
andauthored
l2-regime: rolling-RV regime filter, OOS-verified 2× IC uplift (#236)
* l2-regime: recursive + cyclic + walk-forward analysis scripts Three diagnostic scripts built on existing primitives (slice_features, run_killtest, cross_sectional_ricci_signal) — no new abstractions (AE principles 1, 20). * scripts/l2_killtest_recursive.py — depth-first bisection (up to depth 3) + cyclic K=8 disjoint blocks. Reveals regime structure hidden by full-window averaging. On collected substrate: 3/8 blocks PROCEED with IC up to +0.339; 3/8 KILL with IC as low as -0.109. Signal is intermittent, not uniform. * scripts/l2_regime_analysis.py — per-block regime features (realized vol, cross-asset correlation, dispersion, signed trend, κ_min moments). Spearman rank-correlates block IC against each feature. On K=8: corr_mean strongest direction (ρ=+0.429, p=0.29 n=8). Underpowered for statistical claim; motivates finer-grained analysis. * scripts/l2_walk_forward.py — 40-minute rolling window with 5-minute step across substrate. ~56 windows gives the statistical power that 8 disjoint blocks lack. Reports IC trajectory, Spearman ρ at rolling resolution, quartile bins on the most-correlated feature to find a discriminator threshold. Output artifacts: * results/REGIME_ANALYSIS.json (8-block table + ρ matrix) * results/L2_WALK_FORWARD.json (56-row trajectory, quartile bins) Non-goals: new dataclasses, new production modules. Pure diagnostics. If walk-forward identifies a regime discriminator, next commit adds the regime filter to killtest.py as an optional parameter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * l2-regime: rolling-RV regime filter, OOS-verified 2× IC uplift Walk-forward analysis (scripts/l2_walk_forward.py, 56 rolling 40-min windows) identified rolling realized volatility as the dominant regime discriminator: Spearman ρ(IC_signal, rv_mean) = +0.352 p = 0.008 *** Spearman ρ(IC_signal, corr_mean) = +0.317 p = 0.017 * Spearman ρ(IC_signal, trend_*) = not significant Quartile analysis on rv_mean: Q1_low IC median +0.027 (signal ≈ noise) Q4_high IC median +0.137 (signal works) IN-SAMPLE CONDITIONAL (scripts/l2_regime_conditional.py): unconditional IC = +0.122 rv_w600_q75 IC = +0.256 (frac_on = 24.2 %) rv_w300_q50 IC = +0.177 (frac_on = 49.2 %) TRUE OOS (scripts/l2_regime_oos.py, threshold trained on first half, applied to second half, no information leakage): TEST unconditional IC = +0.116 frac_on = 100.0 % TEST q50 thr from train IC = +0.202 frac_on = 43.9 % TEST q75 thr from train IC = +0.236 frac_on = 36.3 % => 2.03× IC uplift OOS, threshold generalizes Components - research/microstructure/regime.py — 4 functions, no new dataclasses: * rolling_corr_regime(features, window_rows) * rolling_rv_regime(features, window_rows) (primary, OOS-verified) * regime_mask_from_score(score, threshold) * regime_mask_from_quantile(score, quantile) - research/microstructure/killtest.py — single optional parameter: * run_killtest(..., regime_mask: NDArray[bool] | None = None) * Backwards-compatible: None → identical behavior to before * When supplied, mask is applied at scoring time (ricci + target → NaN outside mask), Ricci signal itself still computed on full contiguous series (its rolling corr needs consecutive rows) - tests/test_l2_regime.py — 7 new tests: * shape + warmup on rolling_corr_regime * high-ρ vs low-ρ synthetic discrimination * argument validation (window too small, single symbol) * mask NaN handling * killtest rejects wrong mask shape * trivial all-True mask matches unconditional (regression) All 26 tests green. ruff + black + mypy --strict clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * l2-regime: walk-forward calibration + cross-session OOS scripts Two more diagnostic scripts for the regime filter, completing the generalization ladder: * scripts/l2_regime_walkforward_calibration.py Rolling 60-min calibration + 30-min evaluation. Slides across substrate; at each step derives q50/q75 thresholds from the calibration window, applies them to the next evaluation window. Result on collected substrate: uplift POSITIVE in only 1 of 7 steps (q50) / 1 of 5 steps (q75). HONEST LIMIT: the 50/50 split uplift (IC +0.12 → +0.24 OOS) does NOT survive production-style short-window rolling recalibration. Threshold needs longer calibration horizons to stabilize. * scripts/l2_regime_cross_session.py Cross-session OOS scaffold: takes --train-dir and --test-dir, derives quantile thresholds from the train session, applies to the test session, writes results/L2_REGIME_CROSS_SESSION.json. Runnable against the second 8h session currently being collected into data/binance_l2_perp_v2. Strongest form of OOS we can do without multi-day walk-forward. These land as diagnostics only. The regime MODULE (regime.py) and its integration (run_killtest regime_mask param) are the shippable artifacts. Scripts document the calibration surface honestly — including where the filter breaks under stricter recalibration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9d3611c commit bdbc991

10 files changed

Lines changed: 1382 additions & 1 deletion

research/microstructure/killtest.py

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -355,13 +355,31 @@ def run_killtest(
355355
horizons_sec: tuple[int, ...] = _TARGET_HORIZONS_SEC,
356356
ic_gate: float = _IC_GATE,
357357
pvalue_gate: float = _PERM_PVALUE_GATE,
358+
regime_mask: NDArray[np.bool_] | None = None,
358359
seed: int = SEED,
359360
) -> GateVerdict:
360-
"""Execute the full fail-fast gate and emit a binary verdict."""
361+
"""Execute the full fail-fast gate and emit a binary verdict.
362+
363+
When `regime_mask` is provided (shape (n_rows,)), IC and null-test
364+
computations count only rows where the mask is True. The Ricci signal
365+
is still computed on the full contiguous time series (its rolling
366+
cross-sectional correlation needs consecutive rows) — the filter acts
367+
at scoring time, not feature-construction time.
368+
"""
361369
ricci_signal_1d = cross_sectional_ricci_signal(features.ofi)
362370
ricci_panel = np.repeat(ricci_signal_1d[:, None], features.n_symbols, axis=1)
363371
target = _forward_log_return(features.mid, primary_horizon_sec)
364372

373+
if regime_mask is not None:
374+
if regime_mask.shape != (features.n_rows,):
375+
raise ValueError(
376+
f"regime_mask shape {regime_mask.shape} must equal ({features.n_rows},)"
377+
)
378+
# broadcast row-mask to panel shape
379+
panel_mask = np.broadcast_to(regime_mask[:, None], ricci_panel.shape)
380+
ricci_panel = np.where(panel_mask, ricci_panel, np.nan)
381+
target = np.where(panel_mask, target, np.nan)
382+
365383
ic_signal = _pooled_ic(ricci_panel, target)
366384

367385
ret_1s = np.vstack(

research/microstructure/regime.py

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
"""Regime detection for the L2 kill-test substrate.
2+
3+
Motivation: recursive / cyclic analysis on the collected 5h14m window
4+
shows IC is intermittent — some time blocks produce IC > 0.15, others
5+
invert to IC < -0.05. Full-window verdict averages these. The next
6+
inevitable question is: *when* is the Ricci cross-sectional signal
7+
predictive?
8+
9+
The regime_analysis step flagged cross-asset mean correlation
10+
(`corr_mean` of mid-return) as the feature with the strongest
11+
(directional) relationship to block IC. This module exposes that
12+
feature as a per-row rolling score, and applies a threshold to build
13+
a boolean regime mask consumable by `run_killtest(regime_mask=...)`.
14+
15+
Only one public function and one helper. No new dataclasses.
16+
"""
17+
18+
from __future__ import annotations
19+
20+
import numpy as np
21+
from numpy.typing import NDArray
22+
23+
from research.microstructure.killtest import FeatureFrame
24+
25+
_MIN_WINDOW_ROWS: int = 60
26+
27+
28+
def rolling_corr_regime(
29+
features: FeatureFrame,
30+
*,
31+
window_rows: int = 300,
32+
) -> NDArray[np.float64]:
33+
"""Rolling mean off-diagonal correlation of 1-sec mid-return across symbols.
34+
35+
For each row t >= window_rows, compute the correlation matrix of the
36+
`window_rows` most-recent 1-sec log-return vectors (rows are time,
37+
columns are symbols). Return the mean of off-diagonal entries as the
38+
regime score. Earlier rows are NaN.
39+
40+
High score ⇒ cross-asset correlation is high ⇒ cross-sectional Ricci
41+
signal has meaningful structure to measure. Low score ⇒ assets decouple,
42+
Ricci κ_min becomes noise-driven.
43+
"""
44+
if window_rows < _MIN_WINDOW_ROWS:
45+
raise ValueError(f"window_rows must be >= {_MIN_WINDOW_ROWS}, got {window_rows}")
46+
if features.n_symbols < 2:
47+
raise ValueError(f"need >= 2 symbols for cross-asset correlation, got {features.n_symbols}")
48+
49+
log_mid = np.log(features.mid)
50+
ret = np.vstack([np.zeros((1, features.n_symbols)), np.diff(log_mid, axis=0)])
51+
n = ret.shape[0]
52+
out = np.full(n, np.nan, dtype=np.float64)
53+
eye_mask = ~np.eye(features.n_symbols, dtype=bool)
54+
55+
for t in range(window_rows, n):
56+
block = ret[t - window_rows : t]
57+
if not np.all(np.isfinite(block)):
58+
continue
59+
std = block.std(axis=0)
60+
if np.any(std < 1e-14):
61+
continue
62+
corr_raw = np.corrcoef(block.T)
63+
corr = np.nan_to_num(np.asarray(corr_raw, dtype=np.float64), nan=0.0)
64+
out[t] = float(corr[eye_mask].mean())
65+
return out
66+
67+
68+
def rolling_rv_regime(
69+
features: FeatureFrame,
70+
*,
71+
window_rows: int = 300,
72+
) -> NDArray[np.float64]:
73+
"""Rolling realized volatility (per-symbol mean) of 1-sec mid-return.
74+
75+
Walk-forward analysis on the collected 5h14m substrate identified
76+
realized vol as the single strongest regime discriminator for
77+
Ricci IC (Spearman ρ=+0.352, p=0.008 across 56 rolling windows;
78+
low-vol quartile IC median = +0.027 vs high-vol quartile IC
79+
median = +0.137).
80+
81+
High score ⇒ there is flow / activity ⇒ OFI drives observable
82+
price changes ⇒ cross-sectional Ricci has structural content
83+
to score. Low score ⇒ the book is inert ⇒ OFI → 0 → Ricci → noise.
84+
85+
Implementation: per-row rolling std of 1-sec log-returns averaged
86+
across symbols. No baseline subtraction (we want absolute activity,
87+
not anomaly vs expected).
88+
"""
89+
if window_rows < _MIN_WINDOW_ROWS:
90+
raise ValueError(f"window_rows must be >= {_MIN_WINDOW_ROWS}, got {window_rows}")
91+
if features.n_symbols < 1:
92+
raise ValueError(f"need >= 1 symbol, got {features.n_symbols}")
93+
94+
log_mid = np.log(features.mid)
95+
ret = np.vstack([np.zeros((1, features.n_symbols)), np.diff(log_mid, axis=0)])
96+
n = ret.shape[0]
97+
out = np.full(n, np.nan, dtype=np.float64)
98+
for t in range(window_rows, n):
99+
block = ret[t - window_rows : t]
100+
if not np.all(np.isfinite(block)):
101+
continue
102+
out[t] = float(block.std(axis=0).mean())
103+
return out
104+
105+
106+
def regime_mask_from_score(
107+
score: NDArray[np.float64],
108+
*,
109+
threshold: float,
110+
) -> NDArray[np.bool_]:
111+
"""Boolean mask: True where score >= threshold and finite, False otherwise."""
112+
mask = np.isfinite(score) & (score >= threshold)
113+
return mask.astype(bool)
114+
115+
116+
def regime_mask_from_quantile(
117+
score: NDArray[np.float64],
118+
*,
119+
quantile: float,
120+
) -> NDArray[np.bool_]:
121+
"""Boolean mask: True where score >= empirical quantile of the finite scores.
122+
123+
quantile must lie in (0, 1); e.g. 0.5 keeps the top half, 0.25 keeps
124+
the top 75%. Finite-threshold-free alternative when absolute score
125+
scale depends on substrate.
126+
"""
127+
if not 0.0 < quantile < 1.0:
128+
raise ValueError(f"quantile must lie in (0, 1), got {quantile}")
129+
finite = score[np.isfinite(score)]
130+
if finite.size == 0:
131+
return np.zeros_like(score, dtype=bool)
132+
threshold = float(np.quantile(finite, quantile))
133+
return regime_mask_from_score(score, threshold=threshold)

scripts/l2_killtest_recursive.py

Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
#!/usr/bin/env python3
2+
"""Recursive + cyclic reality check on the collected L2 substrate.
3+
4+
Uses only the existing primitives (`build_feature_frame`, `slice_features`,
5+
`run_killtest`, `run_killtest_split`). No new dataclasses, no new modules,
6+
no new gate logic. Two orthogonal views:
7+
8+
1. RECURSIVE BISECTION (depth-first): at depth d, each cell is a 1/(2**d)
9+
contiguous slice of the full window. Reports IC + residual_IC at every
10+
cell. Signal is `deep` iff it survives every leaf at some depth.
11+
2. CYCLIC BLOCKS (breadth): split the full window into K adjacent disjoint
12+
blocks of equal size; report IC trajectory across them. Signal is
13+
`stable` iff IC sign + magnitude are preserved across blocks.
14+
15+
Reality = what both views say simultaneously.
16+
"""
17+
18+
from __future__ import annotations
19+
20+
from pathlib import Path
21+
22+
from research.microstructure.killtest import (
23+
_load_parquets as load_parquets,
24+
)
25+
from research.microstructure.killtest import (
26+
build_feature_frame,
27+
run_killtest,
28+
slice_features,
29+
)
30+
from research.microstructure.l2_schema import DEFAULT_SYMBOLS
31+
32+
_MIN_ROWS_PER_CELL = 1500
33+
_MAX_DEPTH = 3
34+
_CYCLIC_K = 8
35+
36+
37+
def _recurse(features_obj: object, path: str, depth: int, results: list[dict[str, object]]) -> None:
38+
from research.microstructure.killtest import FeatureFrame # noqa: PLC0415
39+
40+
assert isinstance(features_obj, FeatureFrame)
41+
features: FeatureFrame = features_obj
42+
43+
if features.n_rows < _MIN_ROWS_PER_CELL:
44+
results.append(
45+
{
46+
"path": path,
47+
"depth": depth,
48+
"n_samples": features.n_rows,
49+
"ic_signal": float("nan"),
50+
"residual_ic": float("nan"),
51+
"residual_p": float("nan"),
52+
"note": "too_small",
53+
}
54+
)
55+
return
56+
57+
v = run_killtest(features)
58+
results.append(
59+
{
60+
"path": path,
61+
"depth": depth,
62+
"n_samples": v.n_samples,
63+
"ic_signal": v.ic_signal,
64+
"residual_ic": v.residual_ic,
65+
"residual_p": v.residual_ic_pvalue,
66+
"verdict": v.verdict,
67+
"reasons_count": len(v.reasons),
68+
}
69+
)
70+
71+
if depth >= _MAX_DEPTH:
72+
return
73+
mid = features.n_rows // 2
74+
left = slice_features(features, 0, mid)
75+
right = slice_features(features, mid, features.n_rows)
76+
_recurse(left, f"{path}L", depth + 1, results)
77+
_recurse(right, f"{path}R", depth + 1, results)
78+
79+
80+
def main() -> int:
81+
data_dir = Path("data/binance_l2_perp")
82+
frames = load_parquets(data_dir, DEFAULT_SYMBOLS)
83+
features = build_feature_frame(frames, DEFAULT_SYMBOLS)
84+
print(f"substrate: n_rows={features.n_rows} n_symbols={features.n_symbols}")
85+
print()
86+
87+
# --- 1. Recursive bisection ---
88+
print("=" * 74)
89+
print("RECURSIVE BISECTION TREE (depth 0 = full; L/R = halves at each split)")
90+
print("=" * 74)
91+
tree: list[dict[str, object]] = []
92+
_recurse(features, "·", 0, tree)
93+
print(f"{'path':<10} {'depth':<6} {'n':<7} {'IC':>8} {'residual':>10} {'p':>8} {'verdict':<10}")
94+
for row in tree:
95+
p = row["path"]
96+
d = row["depth"]
97+
n = row["n_samples"]
98+
if row.get("note") == "too_small":
99+
print(f"{p:<10} {d:<6} {n:<7} — too small for stable IC —")
100+
continue
101+
ic = row["ic_signal"]
102+
rr = row["residual_ic"]
103+
pv = row["residual_p"]
104+
vd = row["verdict"]
105+
assert isinstance(p, str) and isinstance(d, int) and isinstance(n, int)
106+
assert isinstance(ic, float) and isinstance(rr, float) and isinstance(pv, float)
107+
assert isinstance(vd, str)
108+
print(f"{p:<10} {d:<6} {n:<7} {ic:>+8.4f} {rr:>+10.4f} {pv:>8.4f} {vd:<10}")
109+
print()
110+
111+
# --- 2. Cyclic K blocks ---
112+
print("=" * 74)
113+
print(f"CYCLIC BLOCKS (K={_CYCLIC_K} adjacent disjoint windows)")
114+
print("=" * 74)
115+
block = features.n_rows // _CYCLIC_K
116+
print(
117+
f"{'block':<6} {'start':<6} {'end':<6} {'n':<7} {'IC':>8} {'residual':>10} {'p':>8} {'verdict':<10}"
118+
)
119+
ic_series: list[float] = []
120+
for k in range(_CYCLIC_K):
121+
start = k * block
122+
end = (k + 1) * block if k < _CYCLIC_K - 1 else features.n_rows
123+
sub = slice_features(features, start, end)
124+
if sub.n_rows < _MIN_ROWS_PER_CELL:
125+
print(f"{k:<6} {start:<6} {end:<6} {sub.n_rows:<7} — too small —")
126+
continue
127+
v = run_killtest(sub)
128+
ic_series.append(v.ic_signal)
129+
print(
130+
f"{k:<6} {start:<6} {end:<6} {v.n_samples:<7} "
131+
f"{v.ic_signal:>+8.4f} {v.residual_ic:>+10.4f} {v.residual_ic_pvalue:>8.4f} "
132+
f"{v.verdict:<10}"
133+
)
134+
print()
135+
if ic_series:
136+
n_pos = sum(1 for ic in ic_series if ic > 0)
137+
avg = sum(ic_series) / len(ic_series)
138+
print(
139+
f"summary: {n_pos}/{len(ic_series)} blocks positive IC avg={avg:+.4f} "
140+
f"min={min(ic_series):+.4f} max={max(ic_series):+.4f}"
141+
)
142+
return 0
143+
144+
145+
if __name__ == "__main__":
146+
raise SystemExit(main())

0 commit comments

Comments
 (0)