Skip to content

Commit 590b880

Browse files
committed
feat(batch): store per-day echograms in day directories instead of separate folder
1 parent 4c77468 commit 590b880

2 files changed

Lines changed: 8 additions & 9 deletions

File tree

scripts/batch_processing/PROCESSING_REPORT.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ The per-pulse-mode zarrs (`*--short_pulse--*.zarr`, `*--long_pulse--*.zarr`) rem
196196
- 3 products (MVBS, denoised, raw Sv) × 2 frequencies (38kHz, 200kHz) × 2 colormaps (`ocean_r`, `EK500`)
197197
- Each echogram has a **pulse-mode colour bar** at the bottom: orange = Short pulse, blue = Long pulse
198198
- Time axis labelled with hourly ticks (UTC)
199-
- Stored in `/mnt/data/output/perday_echograms/`
199+
- Stored alongside combined zarrs in each day directory
200200

201201
**Processing**: 141 days × 4 workers = **~62 minutes** (`run_combine_daily.py`)
202202

@@ -289,6 +289,9 @@ The per-pulse-mode zarrs (`*--short_pulse--*.zarr`, `*--long_pulse--*.zarr`) rem
289289
│ │ ├── 2023-05-30--combined--mvbs.zarr # ← FINAL: MVBS
290290
│ │ ├── 2023-05-30--combined--nasc--38kHz.zarr # ← FINAL: NASC 38 kHz
291291
│ │ ├── 2023-05-30--combined--nasc--200kHz.zarr # ← FINAL: NASC 200 kHz
292+
│ │ ├── 2023-05-30--mvbs--38kHz--ek500.png # ← echogram
293+
│ │ ├── 2023-05-30--denoised--200kHz--ocean_r.png # ← echogram
294+
│ │ ├── ... (more echogram PNGs)
292295
│ │ ├── 2023-05-30--short_pulse.zarr # intermediate
293296
│ │ ├── 2023-05-30--short_pulse--denoised.zarr # intermediate
294297
│ │ ├── 2023-05-30--short_pulse--mvbs.zarr # intermediate
@@ -303,7 +306,6 @@ The per-pulse-mode zarrs (`*--short_pulse--*.zarr`, `*--long_pulse--*.zarr`) rem
303306
├── campaign_echograms/ # 593 MB — 12 PNG echograms
304307
├── tiles/ # 1.9 MB — PMTiles + source GeoJSON
305308
├── nasc_biomass/ # 1.5 MB — NASC points GeoJSON
306-
├── perday_echograms/ # 3.3 GB — 1,610 daily echogram PNGs (NEW)
307309
├── heatmaps/ # 656 KB — COGs + PNGs + manifest
308310
├── raw_downloads/ # empty (cleaned up)
309311
└── *.log # pipeline logs
@@ -345,7 +347,7 @@ All final products uploaded to container `sd-tpos2023-full-v01` on storage accou
345347
| Combined per-day zarrs | `2023-XX-XX/*--combined--*.zarr/` | 221,102 | ~82 GB |
346348
| Campaign MVBS | `campaign_mvbs_combined_38kHz.zarr/` | 2,681 | 9.5 GB |
347349
| Campaign echograms | `campaign_echograms/` | 12 | 593 MB |
348-
| Per-day echograms | `perday_echograms/` | 1,610 | 3.3 GB |
350+
| Per-day echograms | `2023-XX-XX/*.png` | 1,610 | 3.3 GB |
349351
| PMTiles + GeoJSON | `tiles/` | 2 | 2 MB |
350352
| NASC biomass | `nasc_biomass/` | 1 | 1.5 MB |
351353
| NASC heatmaps | `heatmaps/` | 7 | 656 KB |
@@ -374,7 +376,7 @@ print(ds)
374376
| Denoised Sv | 140 | ~30 GB | zarr | `*--combined--denoised.zarr` |
375377
| MVBS | 137 | ~9 GB | zarr | `*--combined--mvbs.zarr` |
376378
| NASC (per-freq) | 216 | ~3 MB | zarr | `*--combined--nasc--{38kHz,200kHz}.zarr` |
377-
| Per-day echograms | 1,610 | 3.3 GB | PNG | `perday_echograms/` |
379+
| Per-day echograms | 1,610 | 3.3 GB | PNG | `{day}/{day}--{product}--{freq}--{cmap}.png` |
378380

379381
**Campaign-level products:**
380382

scripts/batch_processing/run_combine_daily.py

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
Input: {day}/{day}--{pulse_mode}--{product}.zarr (separate per pulse mode)
66
Output: {day}/{day}--combined--{product}.zarr (merged per day)
7-
perday_echograms/{day}--{product}--{freq}--{cmap}.png
7+
{day}/{day}--{product}--{freq}--{cmap}.png (echograms)
88
99
Products handled:
1010
- MVBS: concat along ping_time (depth already aligned at 1m bins)
@@ -51,7 +51,6 @@
5151

5252
BASE_DIR = Path("/mnt/data/output/sd-tpos2023-full-v01")
5353
OUTPUT_DIR = Path("/mnt/data/output")
54-
ECHOGRAM_DIR = OUTPUT_DIR / "perday_echograms"
5554

5655
FREQ_38KHZ = 38000.0
5756
FREQ_200KHZ = 200000.0
@@ -876,7 +875,7 @@ def process_one_day(args: tuple) -> tuple[str, int, int]:
876875

877876
n_echograms = 0
878877
if not skip_echograms and combined_zarrs:
879-
echogram_files = generate_echograms_for_day(day, combined_zarrs, ECHOGRAM_DIR)
878+
echogram_files = generate_echograms_for_day(day, combined_zarrs, BASE_DIR / day)
880879
n_echograms = len(echogram_files)
881880
except Exception as e:
882881
log.error(" %s FAILED: %s", day, e)
@@ -915,8 +914,6 @@ def main() -> None:
915914
if args.force:
916915
args.skip_existing = False
917916

918-
ECHOGRAM_DIR.mkdir(parents=True, exist_ok=True)
919-
920917
# Discover days
921918
if args.day:
922919
days = [args.day]

0 commit comments

Comments
 (0)