3232| MVBS zarrs | 9 GB |
3333| Campaign MVBS combined | 8.9 GB |
3434| NASC zarrs | 44 MB |
35+ | Combined daily zarrs (new) | ~ 82 GB |
36+ | Per-day echograms (new) | 3.3 GB |
3537| Converted echodata | 5 GB |
3638| Campaign echograms | 593 MB |
3739| Tiles + GeoJSON + Heatmaps | 4 MB |
38- | ** Total used** | ** 314 GB / 1 TB** |
40+ | ** Total used** | ** 396 GB / 1 TB** |
3941
4042---
4143
@@ -57,6 +59,7 @@ Stage 10: Campaign echograms (4 segments × 3 colormaps)
5759Stage 11: Echodata PMTiles (vector tiles for map viz)
5860Stage 12: NASC Biomass GeoJSON (depth-frequency merged points)
5961Stage 13: NASC Heatmap COGs (raster overlays + PNG previews)
62+ Stage 14: Combined daily products + per-day echograms (NEW)
6063```
6164
6265### Key Scripts
@@ -66,6 +69,7 @@ Stage 13: NASC Heatmap COGs (raster overlays + PNG previews)
6669| ` build_full_survey.py ` | Main 13-stage pipeline (~ 2900 lines) |
6770| ` run_nasc_parallel.py ` | Fast parallel NASC via numpy (replaced stage 7 NASC) |
6871| ` run_stages_9_to_13.py ` | Standalone post-processing (stages 9–13 without re-running 1–8) |
72+ | ` run_combine_daily.py ` | Merge pulse modes per day + generate echograms with pulse markings |
6973| ` local_storage.py ` | Monkey-patches Azure storage calls to local disk I/O |
7074
7175---
@@ -168,6 +172,31 @@ Skipped with `--skip-perday-echograms` to prioritise campaign-level products. Ca
168172- Grid: 0.5° resolution, scipy griddata interpolation, cKDTree search radius 0.5°
169173- Stored in ` /mnt/data/output/heatmaps/ `
170174
175+ ### Stage 14: Combined Daily Products + Per-day Echograms (NEW)
176+
177+ Merges short_pulse + long_pulse into single per-day combined zarrs. Channels renamed from instrument IDs (` EKA 266972-07 ES38-18|200-18C ` ) to frequency labels (` 38kHz ` , ` 200kHz ` ). Each dataset includes a ` pulse_mode ` variable (0=long, 1=short) for provenance.
178+
179+ ** Products combined:**
180+
181+ | Product | Count | Method |
182+ | ---------| -------| --------|
183+ | Combined MVBS | 137 | Concat along ` ping_time ` (depth aligned at 1m) |
184+ | Combined denoised Sv | 140 | Interpolated to 0.5m common depth grid, concat along ` ping_time ` |
185+ | Combined raw Sv | 141 | Same interpolation as denoised |
186+ | Combined NASC | 216 | Per-frequency files, concat along ` distance ` (offset to avoid overlap) |
187+
188+ Stored as ` {day}/{day}--combined--{product}.zarr ` (NASC: ` {day}--combined--nasc--{freq}.zarr ` ).
189+
190+ ** Per-day echograms:**
191+
192+ - ** 1,610 PNG files** (3.3 GB total)
193+ - 3 products (MVBS, denoised, raw Sv) × 2 frequencies (38kHz, 200kHz) × 2 colormaps (` ocean_r ` , ` EK500 ` )
194+ - Each echogram has a ** pulse-mode colour bar** at the bottom: orange = Short pulse, blue = Long pulse
195+ - Time axis labelled with hourly ticks (UTC)
196+ - Stored in ` /mnt/data/output/perday_echograms/ `
197+
198+ ** Processing** : 141 days × 4 workers = ** ~ 62 minutes** (` run_combine_daily.py ` )
199+
171200---
172201
173202## 4. Issues Found and Fixed
@@ -260,7 +289,12 @@ Skipped with `--skip-perday-echograms` to prioritise campaign-level products. Ca
260289│ │ ├── 2023-05-30--short_pulse--nasc.nc # NASC (NetCDF)
261290│ │ ├── 2023-05-30--long_pulse.zarr
262291│ │ ├── 2023-05-30--long_pulse--denoised.zarr
263- │ │ ├── ... (same pattern)
292+ │ │ ├── ... (same pattern for long_pulse)
293+ │ │ ├── 2023-05-30--combined--mvbs.zarr # ← NEW: combined daily
294+ │ │ ├── 2023-05-30--combined--denoised.zarr
295+ │ │ ├── 2023-05-30--combined--sv.zarr
296+ │ │ ├── 2023-05-30--combined--nasc--38kHz.zarr
297+ │ │ ├── 2023-05-30--combined--nasc--200kHz.zarr
264298│ ├── 2023-05-31/
265299│ ├── ... (141 day directories)
266300│ └── 2023-11-05/
@@ -269,6 +303,7 @@ Skipped with `--skip-perday-echograms` to prioritise campaign-level products. Ca
269303├── campaign_echograms/ # 593 MB — 12 PNG echograms
270304├── tiles/ # 1.9 MB — PMTiles + source GeoJSON
271305├── nasc_biomass/ # 1.5 MB — NASC points GeoJSON
306+ ├── perday_echograms/ # 3.3 GB — 1,610 daily echogram PNGs (NEW)
272307├── heatmaps/ # 656 KB — COGs + PNGs + manifest
273308├── raw_downloads/ # empty (cleaned up)
274309└── *.log # pipeline logs
@@ -325,6 +360,11 @@ azcopy sync "/mnt/data/output/sd-tpos2023-full-v01" \
325360| Echodata track tiles | 1 | 1.3 MB | PMTiles | ` tiles/ ` |
326361| NASC biomass points | 6,135 | 1.5 MB | GeoJSON | ` nasc_biomass/ ` |
327362| NASC heatmaps | 3+3 | 656 KB | COG + PNG | ` heatmaps/ ` |
363+ | Combined MVBS (per-day) | 137 | ~ 9 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
364+ | Combined denoised (per-day) | 140 | ~ 30 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
365+ | Combined raw Sv (per-day) | 141 | ~ 40 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
366+ | Combined NASC (per-day) | 216 | ~ 3 MB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
367+ | Per-day echograms | 1,610 | 3.3 GB | PNG | ` perday_echograms/ ` |
328368
329369---
330370
@@ -354,7 +394,8 @@ azcopy sync "/mnt/data/output/sd-tpos2023-full-v01" \
354394| Stage 11 PMTiles | ~ 10 sec | 141 tracks |
355395| Stage 12 NASC GeoJSON | ~ 5 sec | 6,135 points |
356396| Stage 13 NASC heatmaps | ~ 2 sec | 3 COGs + 3 PNGs |
357- | ** Total wall clock** | ** ~ 14 hours** | Including disk resize downtime |
397+ | ** Stage 14 combined daily** | ** ~ 62 min** | 141 days, 4 workers, 661 zarrs + 1,610 PNGs |
398+ | ** Total wall clock** | ** ~ 15 hours** | Including disk resize downtime |
358399
359400---
360401
@@ -370,4 +411,7 @@ dbb588f fix(batch): stages 11-12 — look for lat/lon in data_vars, prefer denoi
370411519c302 fix: normalize_string_dtypes — handle numpy 2.x StringDType
3714120c18a89 feat(batch): add stages 11-13 — echodata PMTiles, NASC biomass GeoJSON, NASC heatmap COGs
372413a1e577e fix: list_denoised_zarrs scans local disk when local_storage is patched
414+ 213047b fix(batch): deduplicate ping_time in MVBS/NASC combine too
415+ 550c68d fix(batch): deduplicate ping_time + error handling for resilient parallel processing
416+ c8613b5 feat(batch): per-day pulse-mode merge + daily echograms with pulse markings
373417```
0 commit comments