@@ -288,6 +288,80 @@ head(result)
288288writeResults(h5_path , df.output = result , analysis_name = " site_analysis" )
289289```
290290
291+ ### Case study: harmonize with ` covfam ` and stream to a new scalar
292+
293+ You can also use
294+ [ ` ModelArray.wrap() ` ] ( https://pennlinc.github.io/ModelArray/reference/ModelArray.wrap.md )
295+ as a transformation engine rather than a test runner. The example below
296+ applies a harmonization step per element, streams the harmonized values
297+ directly into ` /scalars ` , and then loads that harmonized scalar in a new
298+ ` ModelArray ` .
299+
300+ ``` r
301+ library(covfam )
302+
303+ # Assume phenotypes has harmonization variables, e.g.:
304+ # source_file, site, age, sex
305+ # and that "thickness" is already present in the h5 as an input scalar.
306+
307+ h5_in <- " thickness_raw.h5"
308+ h5_harmonized <- " thickness_harmonized.h5"
309+
310+ ma_raw <- ModelArray(h5_in , scalar_types = " thickness" )
311+ phenotypes <- read.csv(" phenotypes.csv" )
312+
313+ # Copy metadata/layout into a new file so we can append new scalars there.
314+ file.copy(h5_in , h5_harmonized , overwrite = TRUE )
315+
316+ covfam_harmonize_element <- function (data ) {
317+ # Harmonize one element across subjects.
318+ # Adjust arguments here to match your covfam configuration.
319+ out <- covfam :: covfam(
320+ y = data $ thickness ,
321+ batch = data $ site ,
322+ mod = data.frame (age = data $ age , sex = data $ sex )
323+ )
324+
325+ # Return named subject-level vector so wrap columns map to source_file order.
326+ vals <- out $ y_harmonized
327+ names(vals ) <- data $ source_file
328+ vals
329+ }
330+
331+ # Stream harmonized subject-level values into scalars/thickness_covfam/values.
332+ # Set return_output = FALSE to avoid keeping the full output table in memory.
333+ ModelArray.wrap(
334+ FUN = covfam_harmonize_element ,
335+ data = ma_raw ,
336+ phenotypes = phenotypes ,
337+ scalar = " thickness" ,
338+ n_cores = 8 ,
339+ write_scalar_name = " thickness_covfam" ,
340+ write_scalar_file = h5_harmonized ,
341+ write_scalar_flush_every = 2000L ,
342+ return_output = FALSE
343+ )
344+
345+ # Load the harmonized scalar as a new modelling input
346+ ma_harmonized <- ModelArray(h5_harmonized , scalar_types = c(" thickness_covfam" ))
347+
348+ # Use harmonized data in downstream models
349+ fit_harmonized <- ModelArray.lm(
350+ thickness_covfam ~ age + sex ,
351+ data = ma_harmonized ,
352+ phenotypes = phenotypes ,
353+ scalar = " thickness_covfam" ,
354+ n_cores = 8
355+ )
356+ ```
357+
358+ If you want to stream model statistics too (not only transformed
359+ scalars), use ` write_results_name ` and ` write_results_file ` in
360+ [ ` ModelArray.lm() ` ] ( https://pennlinc.github.io/ModelArray/reference/ModelArray.lm.md ) ,
361+ [ ` ModelArray.gam() ` ] ( https://pennlinc.github.io/ModelArray/reference/ModelArray.gam.md ) ,
362+ or
363+ [ ` ModelArray.wrap() ` ] ( https://pennlinc.github.io/ModelArray/reference/ModelArray.wrap.md ) .
364+
291365## Modelling across multiple h5 files with ` mergeModelArrays() `
292366
293367When scalars live in separate h5 files — for example, cortical thickness
0 commit comments