Commit b292c2c
fix(robustness): demean returns before bootstrap — null distribution now centred at zero
CRITICAL correctness fix surfaced during the final review pass. The
previous null implementation sampled the raw returns with replacement,
which produces a null distribution centred at the *observed* sample
mean (because E[mean of resample] = mean of original). Every p-value
was therefore trivially ≈ 0.5 regardless of signal strength — the
framework could not distinguish a real edge from noise.
## Before (broken)
Synthetic validation exposed the bug:
STRONG signal (μ=0.003, SR=3.88): iid_p=0.531 ✗ should be <0.05
MODERATE (μ=0.0008, SR=1.53): iid_p=0.545 ✗ should be <0.1
NOISE (μ=0, SR=0.22): iid_p=0.465 ~ ok
INVERTED (μ=-0.003, SR=-4.98): iid_p=0.471 ✗ should be ≈1
## After (fix)
Same synthetic sweep with demeaned bootstrap:
STRONG signal (SR=3.88): iid_p=0.002 ✓ reject H0
MODERATE (SR=1.53): iid_p=0.002 ✓ reject H0
NOISE (SR=0.22): iid_p=0.262 ✓ cannot reject
INVERTED (SR=-4.98): iid_p=1.000 ✓ far left-tail
## Root cause
A non-demeaned bootstrap tests H₀: 'resampled mean equals observed
mean' which is trivially true by construction. The canonical Sharpe-
vs-zero null test centres each bootstrap draw at zero:
centred = returns - returns.mean()
null[b] = Sharpe(centred[bootstrap_indices])
Only then does the null represent H₀: 'true mean is zero'; the
observed Sharpe is compared against the upper tail. This is the
Lopez de Prado (2018) § 14.3 / Politis & Romano (1994) § 3 convention
for stationary-bootstrap SR tests.
## Evidence on the frozen bundle (demeaned)
iid_bootstrap p = 0.0829 (was 0.5045 broken)
stationary_bootstrap p = 0.1029 (was 0.5235 broken)
observed SR = 0.4832 (log-return Sharpe, unchanged)
The observed Sharpe sits at the 8-10 % upper-tail of the null
distribution — statistically suggestive but below the α=0.05 bar.
Honest FAIL.
## Convergence on the frozen bundle (demeaned)
BEFORE (broken null): NOT_CONVERGED (max |Δp| = 0.0285)
AFTER (demeaned): CONVERGED (max |Δp| = 0.0071)
The fix not only corrects the null semantics but also stabilises the
convergence across {500, 1000, 2000, 5000} trial counts.
## Artefact updates
- null_summary.json, null_convergence.csv, verdict.json, cpcv_summary,
jitter_summary, ROBUSTNESS_RESULTS.md, ROBUSTNESS_SUMMARY.md all
regenerated with the correct null semantics.
- Module docstring rewritten to pin the demeaning convention with
literature references.
- Convergence note in ROBUSTNESS_RESULTS.md updated to reflect the
8-10 % upper-tail reading (not 'well above' as before).
## Guarantees
- 63/63 research/robustness tests green.
- mypy --strict clean across 23 source files.
- 28/28 frozen SOURCE_HASHES artefacts intact.
- Signal code untouched; framework-layer fix only.
- Verdict label unchanged (FAIL → FAIL); evidence now statistically
meaningful.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent b6ca8e2 commit b292c2c
6 files changed
Lines changed: 2069 additions & 2052 deletions
File tree
- research/robustness/protocols
- results/cross_asset_kuramoto/robustness_v1
- scripts
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
28 | 37 | | |
29 | 38 | | |
30 | 39 | | |
| |||
119 | 128 | | |
120 | 129 | | |
121 | 130 | | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
122 | 139 | | |
123 | 140 | | |
124 | 141 | | |
125 | 142 | | |
126 | | - | |
127 | | - | |
| 143 | + | |
| 144 | + | |
128 | 145 | | |
129 | 146 | | |
130 | 147 | | |
131 | | - | |
132 | | - | |
| 148 | + | |
| 149 | + | |
133 | 150 | | |
134 | 151 | | |
135 | 152 | | |
| |||
Lines changed: 7 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
| 13 | + | |
| 14 | + | |
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| |||
Lines changed: 8 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| |||
Lines changed: 8 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
0 commit comments