Skip to content

Commit d38ded6

Browse files
neuron7xLabclaude
andcommitted
fix(dro-ara): apply ADF to log-returns, not raw prices
Engine previously mixed stationarity conventions: DFA computes Hurst on diff(log(price)) (engine.py:138), but ADF ran on raw prices. For canonical I(1) asset prices, ADF rejected stationarity ≈93–99% of the time across all askar assets, making INV-DRO3 a near-tautology and reducing INV-DRO4's LONG-gate to permanent INVALID. Patch (4 lines in State.from_window): - compute log_returns = np.diff(np.log(np.abs(arr) + 1e-12)) - run ADF on log_returns instead of raw arr Conventions now consistent with DFA. Empirical impact (SPDR S&P 500, 69 walk-forward folds): stationary rate: 1.4 % → 100 % (binding constraint relaxed) rs_train_max: 0.298 (now bound by RS_LONG_THRESH) gate-on at current θ: 0 → still 0 (rs threshold now binding) The patch unblocks calibration: with stationarity no longer the dominant filter, threshold tuning of (H_CRITICAL, RS_LONG_THRESH) becomes informative — the next step (T3 grid re-run, T4 calibration) is now meaningful. Test impact: 4 tests rewritten to encode post-RFC semantics, 1 added. - tests/core/dro_ara/test_falsification.py: * test_random_walk_is_invalid_or_transition → test_random_walk_returns_are_stationary_no_long (RW returns i.i.d. → stationary; INV-DRO4 forbids LONG) * test_gbm_with_drift_is_non_stationary → test_gbm_with_drift_returns_are_stationary_no_long (GBM returns N(μ,σ²) → stationary; INV-DRO4 forbids LONG) - tests/core/dro_ara/test_invariants.py: * NEW test_inv_dro3_tightening_post_rfc_ou_stationary_rate (INV-DRO3 tightening guard: OU stationary rate > 50 % across 30 seeds) - tests/core/strategies/test_dro_ara_filter.py: * test_apply_on_gbm_drifts_to_zero → test_apply_on_gbm_is_systematically_reduced (statistical: mean filter mult ≤ 0.55 across 30 seeds, ≥ 80 % reduced) - tests/research/dro_ara/test_backtest_smoke.py: * test_backtest_on_gbm_yields_flat_positions → test_backtest_on_gbm_has_flat_and_active_bars_mix (filter still zeroes DRIFT path → flat-frac > 10 %) - tests/research/dro_ara/test_power_mc.py: * test_gbm_drift_classifies_as_invalid_majority → test_gbm_drift_not_classified_as_critical_majority (false-positive guard: p_critical(GBM) ≤ 0.40) Fail-closed audit (per RFC §8 + feedback_fail_closed_audit) — all PASS: 1. tests/core/dro_ara/test_properties.py 8/8 green 2. Hypothesis fuzz (14 @given strategies) green, no crash 3. SPDR 69-fold smoke gate-on/stat ≥ 20 % PASS (100 %) 4. Full repo regression 11274 passed, 0 failed Quality gates: ruff clean · black clean · mypy --strict clean. Refs: PR #345 (RFC), docs/RFC_DRO_ARA_STATIONARITY_CONVENTION.md §3. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 24f271e commit d38ded6

6 files changed

Lines changed: 144 additions & 26 deletions

File tree

core/dro_ara/engine.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@
33
"""DRO-ARA v7 — Deterministic Recursive Observer + Action Result Acceptor.
44
55
Measures statistical regime of a price series via Hurst (H) on log-returns (DFA-1),
6-
confirms stationarity via lag-augmented ADF (AIC lag selection, Ng & Perron 2001),
7-
and emits a deterministic regime + trading signal through a bounded ARA feedback
8-
loop.
6+
confirms stationarity via lag-augmented ADF on log-returns (AIC lag selection,
7+
Ng & Perron 2001), and emits a deterministic regime + trading signal through a
8+
bounded ARA feedback loop. Both statistical tests operate on the same transform
9+
(∆ log price) — the convention was aligned in PR #345 (RFC-stationarity).
910
1011
Public invariants (never relaxed):
1112
@@ -213,7 +214,11 @@ class State:
213214
@classmethod
214215
def from_window(cls, x: NDArray[np.float64] | np.ndarray) -> "State":
215216
arr: NDArray[np.float64] = np.asarray(x, dtype=np.float64)
216-
stat = _adf_stationary(arr)
217+
# INV-DRO3 convention fix (PR #345 RFC): ADF runs on log-returns, not
218+
# raw prices. Aligns with DFA input (engine.py:138) — was a
219+
# near-tautology on I(1) asset prices when tested on levels.
220+
log_returns = np.diff(np.log(np.abs(arr) + 1e-12))
221+
stat = _adf_stationary(log_returns)
217222
g, H, r2 = derive_gamma(arr)
218223
reg = classify(g, r2, stat)
219224
rs = risk_scalar(g) if reg != Regime.INVALID else 0.0

tests/core/dro_ara/test_falsification.py

Lines changed: 32 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,16 @@
66
the observer classifies them correctly. This is the integration gate: if any
77
scenario fails, the engine must not ship.
88
9+
Post-PR #345 convention: ADF runs on log-returns (not raw prices), matching
10+
the DFA transform. INVALID therefore encodes a *true* unit root in returns,
11+
not in levels — a non-trivial condition.
12+
913
Scenarios (deterministic, seeded):
1014
11-
* OU mean-reverting prices → stationary, CRITICAL (H < 0.45)
12-
* Pure random walk (GBM no drift) → non-stationary, INVALID
13-
* GBM with positive drift → non-stationary, INVALID
14-
* White noise prices → stationary, TRANSITION (H ≈ 0.5)
15+
* OU mean-reverting prices → stationary returns, CRITICAL/TRANSITION
16+
* Pure random walk (GBM no drift) → stationary returns (i.i.d.), never LONG
17+
* GBM with positive drift → stationary returns (μ+σZ), never LONG
18+
* White noise prices → stationary returns, TRANSITION (H ≈ 0.5)
1519
"""
1620

1721
from __future__ import annotations
@@ -71,17 +75,36 @@ def test_ou_mean_reverting_is_critical() -> None:
7175
assert out["regime"] in {Regime.CRITICAL.value, Regime.TRANSITION.value}, out
7276

7377

74-
def test_random_walk_is_invalid_or_transition() -> None:
78+
def test_random_walk_returns_are_stationary_no_long() -> None:
79+
"""Random walk: prices I(1), log-returns i.i.d. → ADF stationary post-RFC.
80+
81+
Hurst estimate for a pure RW is ≈ 0.5 ± finite-sample noise; regime
82+
therefore lands in {CRITICAL, TRANSITION, DRIFT}. The invariant that
83+
*must* hold (INV-DRO4): RW has no true mean-reversion edge, so signal
84+
must never be LONG — regardless of which specific non-INVALID regime
85+
the finite sample produces.
86+
"""
7587
price = _random_walk(SEED, N_SAMPLES)
7688
out = geosync_observe(price, window=WINDOW, step=STEP)
77-
assert out["regime"] in {Regime.INVALID.value, Regime.TRANSITION.value}, out
89+
assert out["stationary"] is True, f"RW returns must be stationary post-RFC: {out}"
90+
assert out["regime"] != Regime.INVALID.value, f"RW should not be INVALID: {out}"
91+
assert out["signal"] != "LONG", f"INV-DRO4: RW must never emit LONG: {out}"
92+
7893

94+
def test_gbm_with_drift_returns_are_stationary_no_long() -> None:
95+
"""GBM with drift: prices I(1), log-returns ~ N(μ, σ²) → ADF stationary.
7996
80-
def test_gbm_with_drift_is_non_stationary() -> None:
97+
Post-RFC (PR #345) the stationarity test targets returns. GBM returns
98+
have no unit root — they are i.i.d. Gaussian — so ``stationary=True``.
99+
Trend at the price level surfaces in the ARA trend path as
100+
DRIFT/DIVERGING, which blocks LONG via INV-DRO4. The true falsification
101+
invariant is the signal gate, not the stationarity classification.
102+
"""
81103
price = _gbm_drift(SEED, N_SAMPLES, mu=0.002, sigma=0.01)
82104
out = geosync_observe(price, window=WINDOW, step=STEP)
83-
assert out["stationary"] is False, f"GBM with drift must fail ADF, got {out}"
84-
assert out["regime"] == Regime.INVALID.value
105+
assert out["stationary"] is True, f"GBM returns must be stationary post-RFC: {out}"
106+
assert out["regime"] != Regime.INVALID.value, f"GBM post-RFC must not be INVALID: {out}"
107+
assert out["signal"] != "LONG", f"INV-DRO4: GBM drift must never emit LONG: {out}"
85108

86109

87110
def test_white_noise_prices_are_stationary() -> None:

tests/core/dro_ara/test_invariants.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,35 @@ def test_observe_deterministic() -> None:
104104
a = geosync_observe(price)
105105
b = geosync_observe(price)
106106
assert a == b
107+
108+
109+
def test_inv_dro3_tightening_post_rfc_ou_stationary_rate() -> None:
110+
"""INV-DRO3 semantic tightening (PR #345 RFC): ADF on log-returns.
111+
112+
Before the RFC, ADF ran on raw prices → near-tautology that declared
113+
virtually every I(1) asset non-stationary. After the RFC, stationarity
114+
is a non-trivial property of returns. For a *true* stationary process
115+
(Ornstein–Uhlenbeck), INV-DRO3 must be satisfied on the vast majority
116+
of seeds: > 50 % stationary rate across independent draws.
117+
118+
If this test regresses, the convention has likely been reverted.
119+
"""
120+
rng_seeds = list(range(30))
121+
stationary_count = 0
122+
for seed in rng_seeds:
123+
r = np.random.default_rng(seed)
124+
n = 1024
125+
mu, theta, sigma = 100.0, 0.08, 0.6
126+
x = np.empty(n, dtype=np.float64)
127+
x[0] = mu
128+
for t in range(1, n):
129+
x[t] = x[t - 1] + theta * (mu - x[t - 1]) + sigma * r.normal()
130+
out = geosync_observe(x)
131+
if out["stationary"] is True:
132+
stationary_count += 1
133+
rate = stationary_count / len(rng_seeds)
134+
assert rate > 0.50, (
135+
f"INV-DRO3 tightening regressed: OU stationary rate = {rate:.2f} "
136+
f"({stationary_count}/{len(rng_seeds)}), expected > 0.50. "
137+
f"Convention may have been reverted to ADF-on-raw-prices."
138+
)

tests/core/strategies/test_dro_ara_filter.py

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -77,13 +77,41 @@ def test_apply_on_ou_yields_nonzero_multiplier() -> None:
7777
assert filtered == pytest.approx(mult) # raw == 1.0
7878

7979

80-
def test_apply_on_gbm_drifts_to_zero() -> None:
81-
price = _gbm(seed=2)
82-
filtered, obs = apply_regime_filter(raw_signal=1.0, price_window=price)
83-
mult = float(obs["regime_multiplier"]) # type: ignore[arg-type]
84-
assert obs["regime"] in {"INVALID", "DRIFT"}
85-
assert mult == 0.0
86-
assert filtered == 0.0
80+
def test_apply_on_gbm_is_systematically_reduced() -> None:
81+
"""GBM with drift: filter systematically reduces signal on average.
82+
83+
Post-PR #345 RFC: ADF runs on log-returns, so GBM is no longer forced
84+
to INVALID — returns are stationary. Finite-sample Hurst clusters
85+
around 0.5, so most seeds land in TRANSITION/DRIFT, and the trend
86+
path further halves CRITICAL cases when DIVERGING is detected.
87+
88+
Aggregate invariant across a seed ensemble:
89+
- Mean multiplier ≤ 0.55 (≥ 45 % reduction on average)
90+
- ≥ 80 % of seeds have multiplier < 1.0 (not full pass-through)
91+
92+
These bounds are statistical, not per-seed: on any single GBM draw the
93+
filter may still fully pass, but over an ensemble the filter must
94+
demonstrate systematic reduction.
95+
"""
96+
mults: list[float] = []
97+
reduced = 0
98+
for seed in range(30):
99+
price = _gbm(seed=seed)
100+
_, obs = apply_regime_filter(raw_signal=1.0, price_window=price)
101+
mult = float(obs["regime_multiplier"]) # type: ignore[arg-type]
102+
mults.append(mult)
103+
if mult < MULTIPLIER_CRITICAL:
104+
reduced += 1
105+
mean_mult = float(np.mean(mults))
106+
reduced_rate = reduced / len(mults)
107+
assert mean_mult <= 0.55, (
108+
f"GBM filter must reduce on average: mean multiplier = {mean_mult:.3f}, "
109+
f"expected ≤ 0.55 across 30 seeds"
110+
)
111+
assert reduced_rate >= 0.80, (
112+
f"GBM filter must not full-pass on most seeds: reduced_rate = "
113+
f"{reduced_rate:.2f}, expected ≥ 0.80 across 30 seeds"
114+
)
87115

88116

89117
def test_apply_preserves_raw_sign_on_critical() -> None:

tests/research/dro_ara/test_backtest_smoke.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -71,11 +71,24 @@ def test_backtest_symbol_schema() -> None:
7171
assert np.all(np.isfinite(bt["pnl_net"]))
7272

7373

74-
def test_backtest_on_gbm_yields_flat_positions() -> None:
74+
def test_backtest_on_gbm_has_flat_and_active_bars_mix() -> None:
75+
"""GBM with drift: filter admits TRANSITION/CRITICAL, zeroes DRIFT.
76+
77+
Post-PR #345 RFC: ADF on returns, so GBM is no longer uniformly INVALID.
78+
Finite-sample H on GBM-drift clusters near 0.5 → regime distribution
79+
lands in {TRANSITION ≈ 67 %, CRITICAL ≈ 17 %, DRIFT ≈ 17 %, INVALID ≈ 0 %}.
80+
The filter still protects by zeroing DRIFT/INVALID (~17 % of windows),
81+
producing an expected flat-bar fraction > 10 % on the active timeline.
82+
"""
7583
price = _gbm(SEED, 2000)
7684
positions = build_positions(price, window=512, step=64, momentum_lag=24)
77-
non_flat = int(np.sum(np.abs(positions) > 0))
78-
assert non_flat <= 16, f"GBM should filter to ≈flat, got {non_flat} active bars"
85+
active_timeline = positions[512 + 64 :]
86+
flat_bars = int(np.sum(active_timeline == 0))
87+
flat_frac = flat_bars / max(len(active_timeline), 1)
88+
assert flat_frac >= 0.10, (
89+
f"Filter must zero some GBM bars (DRIFT path), got flat_frac={flat_frac:.3f}"
90+
)
91+
assert set(np.unique(positions).tolist()) <= {-1, 0, 1}
7992

8093

8194
def test_walk_forward_on_synthetic_panel() -> None:

tests/research/dro_ara/test_power_mc.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,30 @@ def test_ou_classifies_as_critical_majority() -> None:
5555
assert rate >= 0.5, f"OU P(CRITICAL) too low: {rate:.3f}"
5656

5757

58-
def test_gbm_drift_classifies_as_invalid_majority() -> None:
58+
def test_gbm_drift_not_classified_as_critical_majority() -> None:
59+
"""GBM with drift: low false-positive rate for CRITICAL classification.
60+
61+
Post-PR #345 RFC: ADF on returns, so GBM is no longer uniformly INVALID
62+
(INV-DRO3 now encodes true unit root in returns, and GBM returns are
63+
i.i.d. Gaussian). The surviving false-positive invariant: GBM must not
64+
be classified CRITICAL in the majority — CRITICAL is reserved for
65+
genuinely anti-persistent H < 0.45, which GBM-drift does not satisfy
66+
in expectation. Empirically on seed=42, n=30: CRITICAL rate ≈ 17 %.
67+
68+
Contract: p_critical(GBM) ≤ 0.40 (well below OU's ≥ 0.50 threshold).
69+
This is the complementary check to ``test_ou_classifies_as_critical_majority``.
70+
"""
5971
mc = run_mc(n_samples=30, length=1536, window=512, step=64, seed=42)
72+
rate = mc["p_critical"]["gbm_drift"]["p_critical_boot_median"]
73+
assert rate <= 0.40, f"GBM CRITICAL false-positive rate too high: {rate:.3f}, expected ≤ 0.40"
74+
6075
gbm = mc["confusion_matrix"]["gbm_drift"]
6176
total = sum(gbm.values())
6277
assert total > 0
63-
invalid_rate = gbm["INVALID"] / total
64-
assert invalid_rate >= 0.8, f"GBM→INVALID rate too low: {invalid_rate:.3f}"
78+
non_critical_rate = (total - gbm["CRITICAL"]) / total
79+
assert non_critical_rate >= 0.60, (
80+
f"GBM should land in non-CRITICAL regimes majority: {non_critical_rate:.3f}"
81+
)
6582

6683

6784
def test_bootstrap_rate_valid_range() -> None:

0 commit comments

Comments
 (0)