Skip to content

Commit eb3aac8

Browse files
neuron7xLabclaude
andcommitted
docs(robustness): explicit alpha threshold and PSR caveat
Task 4 of the DECISION_GRADE escalation. Pins every statistical threshold to a canonical location and documents the PSR autocorrelation limitation so no reader confuses PSR=1.0 with definitive significance. ## ROBUSTNESS_PROTOCOL.md § 3 — Statistical thresholds Nine thresholds tabulated verbatim with their module-level source: null_alpha = 0.05 kuramoto_null_suite.NULL_PASS_P_THRESHOLD pbo_max = 0.50 kuramoto_cpcv_suite.PBO_PASS_THRESHOLD loo_pbo_max = 0.50 kuramoto_cpcv_suite.LOO_PBO_PASS_THRESHOLD psr_min = 0.95 kuramoto_cpcv_suite.PSR_PASS_THRESHOLD jitter_floor_ratio = 0.80 kuramoto_jitter_suite default sharpe_tolerance = 0.20 kuramoto_jitter_suite.DEFAULT_SHARPE_TOLERANCE pbo_tautological_n = 3 kuramoto_cpcv_suite.PBO_TAUTOLOGICAL_CUTOFF pbo_weak_n = 5 kuramoto_cpcv_suite.PBO_WEAK_CUTOFF null_convergence_tol = 0.02 analysis_null_convergence.CONVERGENCE_TOLERANCE The file is explicit that documentation mirrors the code constants, never the other way round. Threshold drift between code and doc is a bug in the doc. ## ROBUSTNESS_LIMITATIONS.md (new) Five honest catalogue entries: 1. PSR has no autocorrelation adjustment. Lopez de Prado Eq. 14.1 corrects skew + kurtosis, not serial correlation. Regime-following strategies have inflated effective sample sizes; PSR=1.0 on the frozen bundle should not be read as definitive significance. HAC (Newey-West) is the forward fix. 2. Jitter evaluator is placeholder — forced abstain, not pass. 3. LOO-grid PBO has only 5 paths — wide CI on the 0.20 point estimate. 4. Null families are single-stream (no benchmark-matched test). 5. Contract covers frozen bundle only; no re-simulation. Each entry is explicit that it is NOT a bug and NOT required for a valid verdict — only things a reader must account for. ## ROBUSTNESS_RESULTS.md wiring - CPCV row now reads 'PSR (daily, no HAC)' so the caveat is visible at-a-glance in the main results table. - Notes section cross-references ROBUSTNESS_PROTOCOL.md § 3 for thresholds and ROBUSTNESS_LIMITATIONS.md § 1 for the PSR caveat. ## Integrity - Code constants unchanged (per R6: do not change verdict by threshold manipulation). Documentation mirrors existing code. - 63/63 tests/research/robustness green. - mypy --strict clean across touched files. - 28/28 frozen artefacts intact. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9f14490 commit eb3aac8

4 files changed

Lines changed: 123 additions & 10 deletions

File tree

Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
# Cross-asset Kuramoto · Robustness v1 limitations
2+
3+
Honest catalogue of what the v1 framework *does not* measure cleanly.
4+
Nothing below is a bug: every entry is a known statistical or data-
5+
access limitation that a reader MUST account for when interpreting
6+
`verdict.json`, `null_summary.json`, or `ROBUSTNESS_RESULTS.md`.
7+
8+
## 1. PSR has no autocorrelation adjustment
9+
10+
`research.robustness.cpcv.probabilistic_sharpe_ratio` implements the
11+
Lopez de Prado (2018) Eq. 14.1 PSR. The formula corrects for skewness
12+
(γ₃) and kurtosis (γ₄) of the sample distribution but **does not**
13+
correct for serial correlation in the return stream.
14+
15+
Strategy returns that exhibit positive first-order autocorrelation —
16+
which is typical of regime-following strategies — inflate the
17+
effective sample size used in the Sharpe-variance denominator.
18+
Consequences:
19+
20+
- The reported `psr_daily = 1.0000` on the frozen bundle should
21+
**not** be read as definitive statistical significance.
22+
- Under HAC (heteroscedasticity- and autocorrelation-consistent)
23+
adjustment (Newey–West, Andrews–Monahan kernel), the effective
24+
sample size shrinks and the PSR would be materially lower.
25+
26+
Implementing HAC-adjusted PSR is a forward improvement and is
27+
out of scope for v1. The caveat is cross-linked from
28+
`ROBUSTNESS_RESULTS.md` under the CPCV row.
29+
30+
## 2. Jitter evaluator is `PLACEHOLDER_APPROXIMATION`
31+
32+
`kuramoto_jitter_executor.make_placeholder_evaluator` returns a
33+
smooth quadratic in fractional parameter-space distance scaled by the
34+
anchor Sharpe. This exercises the primitive contract but does **not**
35+
rebuild the strategy under perturbed parameters.
36+
37+
- The row in `ROBUSTNESS_RESULTS.md` shows `N/A`, not ✓.
38+
- `fraction_within_tol_pass` is forced to `False` regardless of raw
39+
fraction — the decision layer treats placeholder evidence as
40+
abstention, not a pass.
41+
- Replacing the executor requires access to the raw asset panel (not
42+
in the frozen bundle); pairing that panel with the frozen parameter
43+
lock yields a live evaluator.
44+
45+
## 3. LOO-grid PBO has low path count
46+
47+
`results/cross_asset_kuramoto/offline_robustness/leave_one_asset_out.csv`
48+
ships 5 folds × 13 perturbations. Bailey et al.'s CPCV PBO achieves
49+
full statistical power at C(N, k) paths with N ≥ 8. With 5 paths the
50+
PBO estimate has wide confidence intervals; the reported 0.20 is a
51+
point estimate, not a CI-backed lower bound.
52+
53+
A higher-power PBO requires either a richer strategy-parameter grid
54+
(non-frozen; out of scope) or importance-sampled CPCV over an expanded
55+
fold geometry.
56+
57+
## 4. Null families do not include benchmark-matched tests
58+
59+
The single-stream null suite compares the realised Sharpe against
60+
bootstrapped resamples of itself. It does **not** test whether the
61+
strategy outperforms a matched-cost, matched-lag benchmark such as
62+
BF1 equal-weight. That measurement lives in the offline packet
63+
(`benchmark_family.csv`) and is cross-referenced by
64+
`SEPARATION_FINDING.md`.
65+
66+
## 5. Contract covers the frozen bundle only
67+
68+
Everything above operates on `SOURCE_HASHES.json` (28 artefacts) +
69+
`leave_one_asset_out.csv` (inline-hash-verified extension). The framework
70+
does **not** re-run the spike or re-simulate the strategy. It is a
71+
*read-only* audit layer.
72+
73+
## Forward improvements
74+
75+
Any of the five items above can be closed without changing the
76+
existing primitives:
77+
78+
1. HAC-PSR adjustment (Newey–West kernel inside
79+
`probabilistic_sharpe_ratio`).
80+
2. Live jitter evaluator (raw asset panel + frozen parameter lock).
81+
3. Higher-power PBO (expand LOO grid or import full spike parameter
82+
sweep).
83+
4. Benchmark-matched null families (import `benchmark_family.csv`).
84+
5. Protocol-level contract covering the live-shadow evidence rail
85+
(not just the demo bundle).
86+
87+
None of these is required for a valid FAIL or PASS verdict on the
88+
current frozen evidence; each would tighten the confidence interval
89+
around that verdict.

results/cross_asset_kuramoto/robustness_v1/ROBUSTNESS_PROTOCOL.md

Lines changed: 21 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -57,14 +57,27 @@ information content beyond short-horizon autocorrelation.
5757
Both families share a seeded `np.random.default_rng` and emit a
5858
Davison–Hinkley +1 continuity-corrected upper-tail p-value.
5959

60-
## 3. Decision thresholds
61-
62-
See `ROBUSTNESS_PROTOCOL.md § Statistical thresholds` (populated by
63-
Task 4) for the canonical `alpha`, `pbo_max`, `psr_min`, and
64-
jitter-tolerance values. All thresholds are encoded as module-level
65-
constants in `research/robustness/protocols/*_suite.py` and
66-
`backtest/robustness_gates.py`; the documentation mirrors the constants,
67-
never the other way round.
60+
## 3. Statistical thresholds
61+
62+
All thresholds are encoded as module-level constants; this section
63+
mirrors the constants, never the other way round. Drift between code
64+
and this section is a bug in the documentation.
65+
66+
| Threshold | Value | Where set | Semantics |
67+
|---|---:|---|---|
68+
| `null_alpha` | 0.05 | `kuramoto_null_suite.NULL_PASS_P_THRESHOLD` | Upper-tail α for either null family |
69+
| `pbo_max` | 0.50 | `kuramoto_cpcv_suite.PBO_PASS_THRESHOLD` | Fold-mirror PBO must be below this |
70+
| `loo_pbo_max` | 0.50 | `kuramoto_cpcv_suite.LOO_PBO_PASS_THRESHOLD` | LOO-grid PBO must be below this |
71+
| `psr_min` | 0.95 | `kuramoto_cpcv_suite.PSR_PASS_THRESHOLD` | Probabilistic Sharpe must exceed this |
72+
| `jitter_floor_ratio` | 0.80 | `kuramoto_jitter_suite.run_kuramoto_jitter_suite` default `fraction_within_tol_pass` | Fraction of jitter candidates within `sharpe_tolerance` (live evaluator only) |
73+
| `sharpe_tolerance` | 0.20 | `kuramoto_jitter_suite.DEFAULT_SHARPE_TOLERANCE` | Absolute |ΔSharpe| band for jitter evaluator |
74+
| `pbo_tautological_n` | 3 | `kuramoto_cpcv_suite.PBO_TAUTOLOGICAL_CUTOFF` | Below this candidate count, PBO is tautological |
75+
| `pbo_weak_n` | 5 | `kuramoto_cpcv_suite.PBO_WEAK_CUTOFF` | Below this candidate count, PBO is weak |
76+
| `null_convergence_tol` | 0.02 | `analysis_null_convergence.CONVERGENCE_TOLERANCE` | Max \|Δp\| across adjacent trial counts for CONVERGED |
77+
78+
Threshold semantics are one-sided unless stated otherwise.
79+
Null-family tests are upper-tail: reject H₀ when *observed* Sharpe is
80+
in the upper α tail of the bootstrap distribution.
6881

6982
## 4. Artefacts written
7083

results/cross_asset_kuramoto/robustness_v1/ROBUSTNESS_RESULTS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Terminal decision: **FAIL**
77
| Suite | Metric | Value | Pass |
88
|---|---|---:|:-:|
99
| CPCV | PBO (fold mirror, n=2, *tautological*) | 0.0000 ||
10-
| CPCV | PSR (daily) | 1.0000 ||
10+
| CPCV | PSR (daily, no HAC) | 1.0000 ||
1111
| CPCV | Annualised Sharpe (daily) | 0.4832 | n/a |
1212
| CPCV | PBO (LOO grid, n=13, *admissible*) | 0.2000 ||
1313
| Null | iid_bootstrap p-value | 0.5045 ||
@@ -34,3 +34,5 @@ Terminal decision: **FAIL**
3434
- Null suite uses mathematically exact daily log-returns (`diff(log(strategy_cumret))`) — no approximation. See `ROBUSTNESS_PROTOCOL.md` § 1 for the derivation contract.
3535
- PBO interpretation: fewer than 3 candidates is `tautological`, fewer than 5 is `weak`, 5+ is `admissible`. The fold-mirror PBO is always tautological by construction and is kept only as a sanity baseline; the LOO-grid PBO is the decision-grade one.
3636
- Jitter row shows `N/A` while the evaluator is `PLACEHOLDER_APPROXIMATION`; a live rebuild is required to replace the row with a real ✓ / ✗.
37+
- PSR column is *not* HAC-adjusted. Under positive serial correlation — typical of regime-following strategies — the effective sample size is smaller than the nominal T, and `psr_daily = 1.0000` is inflated. See `ROBUSTNESS_LIMITATIONS.md` § 1 for the forward-improvement path (Newey–West kernel).
38+
- Decision thresholds (α = 0.05, pbo_max = 0.50, psr_min = 0.95, jitter_floor = 0.80) are documented verbatim in `ROBUSTNESS_PROTOCOL.md` § 3.

scripts/run_kuramoto_robustness_v1.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ def _render_markdown(
110110
f"*{cpcv_dict['pbo_interpretation']}*) | "
111111
f"{cpcv_dict['pbo']:.4f} | "
112112
f"{'✓' if cpcv_dict['pbo_pass'] else '✗'} |",
113-
f"| CPCV | PSR (daily) | {cpcv_dict['psr_daily']:.4f} | "
113+
f"| CPCV | PSR (daily, no HAC) | {cpcv_dict['psr_daily']:.4f} | "
114114
f"{'✓' if cpcv_dict['psr_pass'] else '✗'} |",
115115
f"| CPCV | Annualised Sharpe (daily) | {cpcv_dict['annualised_sharpe']:.4f} | n/a |",
116116
]
@@ -189,6 +189,15 @@ def _render_markdown(
189189
"- Jitter row shows `N/A` while the evaluator is "
190190
"`PLACEHOLDER_APPROXIMATION`; a live rebuild is required to "
191191
"replace the row with a real ✓ / ✗.",
192+
"- PSR column is *not* HAC-adjusted. Under positive serial "
193+
"correlation — typical of regime-following strategies — the "
194+
"effective sample size is smaller than the nominal T, and "
195+
"`psr_daily = 1.0000` is inflated. See "
196+
"`ROBUSTNESS_LIMITATIONS.md` § 1 for the forward-improvement "
197+
"path (Newey–West kernel).",
198+
"- Decision thresholds (α = 0.05, pbo_max = 0.50, "
199+
"psr_min = 0.95, jitter_floor = 0.80) are documented "
200+
"verbatim in `ROBUSTNESS_PROTOCOL.md` § 3.",
192201
"",
193202
]
194203
)

0 commit comments

Comments
 (0)