@@ -90,39 +90,39 @@ Stage 14: Combined daily products + per-day echograms (NEW)
9090
9191### Stage 4: Compute Sv
9292
93- - ** 277 raw Sv zarrs** (137 short_pulse + 140 long_pulse)
93+ - ** 277 per-pulse-mode Sv zarrs** (137 short_pulse + 140 long_pulse) — intermediates
9494- Typical shape: ` (channels=2, ping_time=~15000-32000, range_sample=~3600-7200) `
95- - Stored as e.g. ` 2023-07-15/2023-07-15--short_pulse.zarr `
95+ - ** Final per-day products ** : 141 combined Sv zarrs (both pulse modes merged, channels: ` 38kHz ` , ` 200kHz ` )
9696
9797### Stage 5–6: Calibrate + Enrich + Denoise
9898
99- - ** 268 denoised zarrs** (132 short_pulse + 136 long_pulse)
99+ - ** 268 per-pulse-mode denoised zarrs** (132 short_pulse + 136 long_pulse) — intermediates
100100- 9 raw zarrs had no matching GPS or failed calibration → skipped
101101- 4-stage denoising: background noise removal → impulse noise → attenuation correction → transient removal
102102- GPS (latitude/longitude) merged from ` gpsdata ` container into denoised datasets
103- - Stored as e.g. ` 2023-07-15/2023-07-15--short_pulse-- denoised.zarr `
103+ - ** Final per-day products ** : 140 combined denoised zarrs (both pulse modes merged)
104104
105105** GPS coverage issue** : 34 denoised zarrs have all-NaN GPS coordinates. These are consistently one pulse mode per affected day — the GPS merge succeeded for one mode but not the other (likely timing mismatch between GPS timestamps and sonar ping times for the alternate pulse mode).
106106
107107### Stage 7: Per-day MVBS
108108
109- - ** 261 MVBS zarrs** (+ 261 NetCDF copies)
109+ - ** 261 per-pulse-mode MVBS zarrs** (+ 261 NetCDF copies) — intermediates
110110- Bins: ` range_bin=1m ` , ` ping_time_bin=10s `
111111- Computed with ` echopype.commongrid.compute_MVBS() `
112- - Stored as e.g. ` 2023-07-15/2023-07-15--short_pulse--mvbs.zarr ` and ` .nc `
112+ - ** Final per-day products ** : 137 combined MVBS zarrs (both pulse modes merged)
113113
114114### Stage 7 (NASC): Per-day NASC — Fast Vectorized
115115
116116** Original approach** (echopype ` compute_NASC ` ): ~ 90 GB RAM, 15–60 min per zarr. Only 5 zarrs completed before the pipeline was killed due to stalled computation.
117117
118118** Replacement** (` run_nasc_parallel.py ` ): Pure numpy + haversine + ` np.bincount ` . ~ 7 GB per worker, 1–17 seconds per zarr. ** ~ 600× faster.**
119119
120- - ** 229 NASC zarrs** (+ 229 NetCDF copies)
120+ - ** 229 per-pulse-mode NASC zarrs** (+ 229 NetCDF copies) — intermediates
121121 - 109 short_pulse + 120 long_pulse
122122- Bins: ` range_bin=10m ` , ` dist_bin=0.5nmi `
123123- ** 222 computed in 2 minutes** (10 parallel workers)
124124- 34 skipped (all-NaN GPS), 5 failed (see §4)
125- - Stored as e.g. ` 2023-07-15/2023-07-15--short_pulse--nasc.zarr ` and ` .nc `
125+ - ** Final per-day products ** : 216 combined NASC zarrs (per-frequency: 38kHz + 200kHz)
126126
127127### Stage 8: Per-day Echograms — SKIPPED
128128
@@ -172,24 +172,27 @@ Skipped with `--skip-perday-echograms` to prioritise campaign-level products. Ca
172172- Grid: 0.5° resolution, scipy griddata interpolation, cKDTree search radius 0.5°
173173- Stored in ` /mnt/data/output/heatmaps/ `
174174
175- ### Stage 14: Combined Daily Products + Per-day Echograms (NEW)
175+ ### Stage 14: Pulse-Mode Merge + Per-day Echograms
176176
177- Merges short_pulse + long_pulse into single per-day combined zarrs. Channels renamed from instrument IDs ( ` EKA 266972-07 ES38-18|200-18C ` ) to frequency labels ( ` 38kHz ` , ` 200kHz ` ). Each dataset includes a ` pulse_mode ` variable (0=long, 1=short) for provenance .
177+ The raw pipeline (stages 4–7) processes each pulse mode separately, producing per-pulse-mode intermediate zarrs. Stage 14 merges these into the ** final per-day products ** — one zarr per day per product level, with both pulse modes combined .
178178
179- ** Products combined: **
179+ Channels renamed from instrument IDs ( ` EKA 266972-07 ES38-18|200-18C ` ) to frequency labels ( ` 38kHz ` , ` 200kHz ` ). Each dataset includes a ` pulse_mode ` variable (0=long, 1=short) for provenance.
180180
181- | Product | Count | Method |
182- | ---------| -------| --------|
183- | Combined MVBS | 137 | Concat along ` ping_time ` (depth aligned at 1m) |
184- | Combined denoised Sv | 140 | Interpolated to 0.5m common depth grid, concat along ` ping_time ` |
185- | Combined raw Sv | 141 | Same interpolation as denoised |
186- | Combined NASC | 216 | Per-frequency files, concat along ` distance ` (offset to avoid overlap) |
181+ ** Final per-day products:**
187182
188- Stored as e.g. ` 2023-07-15/2023-07-15--combined--mvbs.zarr ` (NASC: ` 2023-07-15--combined--nasc--38kHz.zarr ` ).
183+ | Product | Count | Merge method | Example filename |
184+ | ---------| -------| -------------| ------------------|
185+ | Sv (raw) | 141 | Interpolated to 0.5m common depth grid, concat along ` ping_time ` | ` 2023-07-15--combined--sv.zarr ` |
186+ | Denoised Sv | 140 | Same interpolation as raw Sv | ` 2023-07-15--combined--denoised.zarr ` |
187+ | MVBS | 137 | Concat along ` ping_time ` (depth already aligned at 1m) | ` 2023-07-15--combined--mvbs.zarr ` |
188+ | NASC (per-freq) | 216 | Concat along ` distance ` (offset to avoid overlap) | ` 2023-07-15--combined--nasc--38kHz.zarr ` |
189+
190+ The per-pulse-mode zarrs (` *--short_pulse--*.zarr ` , ` *--long_pulse--*.zarr ` ) remain on disk as intermediates but are ** not the deliverable products** .
189191
190192** Per-day echograms:**
191193
192194- ** 1,610 PNG files** (3.3 GB total)
195+ - Generated from the combined zarrs (not per-pulse-mode)
193196- 3 products (MVBS, denoised, raw Sv) × 2 frequencies (38kHz, 200kHz) × 2 colormaps (` ocean_r ` , ` EK500 ` )
194197- Each echogram has a ** pulse-mode colour bar** at the bottom: orange = Short pulse, blue = Long pulse
195198- Time axis labelled with hourly ticks (UTC)
@@ -279,22 +282,19 @@ Stored as e.g. `2023-07-15/2023-07-15--combined--mvbs.zarr` (NASC: `2023-07-15--
279282
280283```
281284/mnt/data/output/
282- ├── sd-tpos2023-full-v01/ # 300 GB — per-day products
285+ ├── sd-tpos2023-full-v01/ # ~380 GB — per-day products
283286│ ├── 2023-05-30/
284- │ │ ├── 2023-05-30--short_pulse.zarr # raw Sv
285- │ │ ├── 2023-05-30--short_pulse--denoised.zarr # denoised Sv
286- │ │ ├── 2023-05-30--short_pulse--mvbs.zarr # MVBS
287- │ │ ├── 2023-05-30--short_pulse--mvbs.nc # MVBS (NetCDF)
288- │ │ ├── 2023-05-30--short_pulse--nasc.zarr # NASC
289- │ │ ├── 2023-05-30--short_pulse--nasc.nc # NASC (NetCDF)
290- │ │ ├── 2023-05-30--long_pulse.zarr
291- │ │ ├── 2023-05-30--long_pulse--denoised.zarr
292- │ │ ├── ... (same pattern for long_pulse)
293- │ │ ├── 2023-05-30--combined--mvbs.zarr # ← NEW: combined daily
294- │ │ ├── 2023-05-30--combined--denoised.zarr
295- │ │ ├── 2023-05-30--combined--sv.zarr
296- │ │ ├── 2023-05-30--combined--nasc--38kHz.zarr
297- │ │ ├── 2023-05-30--combined--nasc--200kHz.zarr
287+ │ │ ├── 2023-05-30--combined--sv.zarr # ← FINAL: raw Sv (both pulse modes)
288+ │ │ ├── 2023-05-30--combined--denoised.zarr # ← FINAL: denoised Sv
289+ │ │ ├── 2023-05-30--combined--mvbs.zarr # ← FINAL: MVBS
290+ │ │ ├── 2023-05-30--combined--nasc--38kHz.zarr # ← FINAL: NASC 38 kHz
291+ │ │ ├── 2023-05-30--combined--nasc--200kHz.zarr # ← FINAL: NASC 200 kHz
292+ │ │ ├── 2023-05-30--short_pulse.zarr # intermediate
293+ │ │ ├── 2023-05-30--short_pulse--denoised.zarr # intermediate
294+ │ │ ├── 2023-05-30--short_pulse--mvbs.zarr # intermediate
295+ │ │ ├── 2023-05-30--long_pulse.zarr # intermediate
296+ │ │ ├── 2023-05-30--long_pulse--denoised.zarr # intermediate
297+ │ │ └── ... (+ .nc copies, long_pulse mvbs/nasc)
298298│ ├── 2023-05-31/
299299│ ├── ... (141 day directories)
300300│ └── 2023-11-05/
@@ -321,7 +321,7 @@ ls /mnt/data/output/sd-tpos2023-full-v01/
321321source ~ /workspace/venv/bin/activate
322322python3 -c "
323323import xarray as xr
324- ds = xr.open_zarr('/mnt/data/output/sd-tpos2023-full-v01/2023-07-15/2023-07-15--short_pulse--nasc .zarr')
324+ ds = xr.open_zarr('/mnt/data/output/sd-tpos2023-full-v01/2023-07-15/2023-07-15--combined--mvbs .zarr')
325325print(ds)
326326"
327327```
@@ -349,22 +349,34 @@ azcopy sync "/mnt/data/output/sd-tpos2023-full-v01" \
349349
350350## 6. Data Products Summary
351351
352+ ** Final per-day products** (combined pulse modes — the deliverables):
353+
354+ | Product | Count | Size | Format | Filename pattern |
355+ | ---------| -------| ------| --------| ------------------|
356+ | Sv (raw) | 141 | ~ 40 GB | zarr | ` *--combined--sv.zarr ` |
357+ | Denoised Sv | 140 | ~ 30 GB | zarr | ` *--combined--denoised.zarr ` |
358+ | MVBS | 137 | ~ 9 GB | zarr | ` *--combined--mvbs.zarr ` |
359+ | NASC (per-freq) | 216 | ~ 3 MB | zarr | ` *--combined--nasc--{38kHz,200kHz}.zarr ` |
360+ | Per-day echograms | 1,610 | 3.3 GB | PNG | ` perday_echograms/ ` |
361+
362+ ** Campaign-level products:**
363+
352364| Product | Count | Size | Format | Location |
353365| ---------| -------| ------| --------| ----------|
354- | Raw Sv (per-day) | 277 | 193 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
355- | Denoised Sv | 268 | 91 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
356- | MVBS (per-day) | 261 | 9 GB | zarr + nc | ` sd-tpos2023-full-v01/{day}/ ` |
357- | NASC (per-day) | 229 | 44 MB | zarr + nc | ` sd-tpos2023-full-v01/{day}/ ` |
358366| Campaign MVBS (38 kHz) | 1 | 8.9 GB | zarr | ` campaign_mvbs_combined_38kHz.zarr ` |
359367| Campaign echograms | 12 | 593 MB | PNG | ` campaign_echograms/ ` |
360368| Echodata track tiles | 1 | 1.3 MB | PMTiles | ` tiles/ ` |
361369| NASC biomass points | 6,135 | 1.5 MB | GeoJSON | ` nasc_biomass/ ` |
362370| NASC heatmaps | 3+3 | 656 KB | COG + PNG | ` heatmaps/ ` |
363- | Combined MVBS (per-day) | 137 | ~ 9 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
364- | Combined denoised (per-day) | 140 | ~ 30 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
365- | Combined raw Sv (per-day) | 141 | ~ 40 GB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
366- | Combined NASC (per-day) | 216 | ~ 3 MB | zarr | ` sd-tpos2023-full-v01/{day}/ ` |
367- | Per-day echograms | 1,610 | 3.3 GB | PNG | ` perday_echograms/ ` |
371+
372+ ** Intermediate per-pulse-mode products** (on disk but not deliverables):
373+
374+ | Product | Count | Size | Format |
375+ | ---------| -------| ------| --------|
376+ | Raw Sv | 277 | 193 GB | zarr |
377+ | Denoised Sv | 268 | 91 GB | zarr |
378+ | MVBS | 261 | 9 GB | zarr + nc |
379+ | NASC | 229 | 44 MB | zarr + nc |
368380
369381---
370382
0 commit comments