|
1 | | -# Fit linear model for element-wise data |
| 1 | +# Fit element-wise linear models |
2 | 2 |
|
3 | | -\`ModelArray.lm\` fits linear model (\`stats::lm()\`) for each of |
4 | | -elements requested, and returns a tibble dataframe of requested model |
5 | | -statistics. |
| 3 | +`ModelArray.lm` fits a linear model at each requested element in a |
| 4 | +[ModelArray](https://pennlinc.github.io/ModelArray/reference/ModelArray-class.md) |
| 5 | +and returns a tibble of requested model statistics. |
6 | 6 |
|
7 | 7 | ## Usage |
8 | 8 |
|
@@ -38,151 +38,206 @@ ModelArray.lm( |
38 | 38 |
|
39 | 39 | - formula: |
40 | 40 |
|
41 | | - Formula (passed to \`stats::lm()\`) |
| 41 | + Formula (passed to [`lm`](https://rdrr.io/r/stats/lm.html)). |
42 | 42 |
|
43 | 43 | - data: |
44 | 44 |
|
45 | | - ModelArray class |
| 45 | + A |
| 46 | + [ModelArray](https://pennlinc.github.io/ModelArray/reference/ModelArray-class.md) |
| 47 | + object. |
46 | 48 |
|
47 | 49 | - phenotypes: |
48 | 50 |
|
49 | 51 | A data.frame of the cohort with columns of independent variables and |
50 | | - covariates to be added to the model. It should contains a column |
51 | | - called "source_file", and this column should match to that in `data`. |
| 52 | + covariates to be added to the model. It must contain a column called |
| 53 | + `"source_file"` whose entries match those in |
| 54 | + `sources(data)[[scalar]]`. |
52 | 55 |
|
53 | 56 | - scalar: |
54 | 57 |
|
55 | | - A character. The name of the element-wise scalar to be analysed |
| 58 | + Character. The name of the element-wise scalar to analyse. Must be one |
| 59 | + of `names(scalars(data))`. |
56 | 60 |
|
57 | 61 | - element.subset: |
58 | 62 |
|
59 | | - A list of positive integers (min = 1, max = number of elements). The |
60 | | - subset of elements you want to run. Default is \`NULL\`, i.e. |
61 | | - requesting all elements in \`data\`. |
| 63 | + Integer vector of element indices (1-based) to run. Default is `NULL`, |
| 64 | + i.e. all elements in `data`. |
62 | 65 |
|
63 | 66 | - full.outputs: |
64 | 67 |
|
65 | | - TRUE or FALSE, Whether to return full set of outputs. If FALSE, it |
66 | | - will only return those requested in arguments `var.*` and |
67 | | - `correct.p.value.*`; if TRUE, arguments `var.*` will be ignored, and |
68 | | - will return all possible statistics for `var.*` and any options |
69 | | - requested in arguments `correct.p.value.*`. |
| 68 | + Logical. If `TRUE`, return the full set of statistics (ignoring |
| 69 | + `var.*` arguments). If `FALSE` (default), only return those requested |
| 70 | + in `var.*` and `correct.p.value.*`. |
70 | 71 |
|
71 | 72 | - var.terms: |
72 | 73 |
|
73 | | - A list of characters. The list of variables to save for terms (got |
74 | | - from \`broom::tidy()\`). See "Details" section for more. |
| 74 | + Character vector. Statistics to save per term, from |
| 75 | + [`broom::tidy()`](https://generics.r-lib.org/reference/tidy.html). See |
| 76 | + Details. |
75 | 77 |
|
76 | 78 | - var.model: |
77 | 79 |
|
78 | | - A list of characters. The list of variables to save for the model (got |
79 | | - from \`broom::glance()\`). See "Details" section for more. |
| 80 | + Character vector. Statistics to save for the overall model, from |
| 81 | + [`broom::glance()`](https://generics.r-lib.org/reference/glance.html). |
| 82 | + See Details. |
80 | 83 |
|
81 | 84 | - correct.p.value.terms: |
82 | 85 |
|
83 | | - A list of characters. To perform and add a column for p.value |
84 | | - correction for each term. Default: "fdr". See "Details" section for |
85 | | - more. |
| 86 | + Character vector. P-value correction method(s) for each term. Default: |
| 87 | + `"fdr"`. See Details. |
86 | 88 |
|
87 | 89 | - correct.p.value.model: |
88 | 90 |
|
89 | | - A list of characters. To perform and add a column for p.value |
90 | | - correction for the model. Default: "fdr". See "Details" section for |
91 | | - more. |
| 91 | + Character vector. P-value correction method(s) for the model-level |
| 92 | + p-value. Default: `"fdr"`. See Details. |
92 | 93 |
|
93 | 94 | - num.subj.lthr.abs: |
94 | 95 |
|
95 | | - An integer, lower threshold of absolute number of subjects. For an |
96 | | - element, if number of subjects who have finite values (defined by |
97 | | - \`is.finite()\`, i.e. not NaN or NA or Inf) in h5 file \> |
98 | | - `num.subj.lthr.abs`, then this element will be run normally; |
99 | | - otherwise, this element will be skipped and statistical outputs will |
100 | | - be set as NaN. Default is 10. |
| 96 | + Integer. Lower threshold for the absolute number of subjects with |
| 97 | + finite scalar values (not `NaN`, `NA`, or `Inf`) required per element. |
| 98 | + Elements below this threshold are skipped (outputs set to `NaN`). |
| 99 | + Default is 10. |
101 | 100 |
|
102 | 101 | - num.subj.lthr.rel: |
103 | 102 |
|
104 | | - A value between 0-1, lower threshold of relative number of subjects. |
105 | | - Similar to `num.subj.lthr.abs`, if proportion of subjects who have |
106 | | - valid value \> `num.subj.lthr.rel`, then this element will be run |
107 | | - normally; otherwise, this element will be skipped and statistical |
108 | | - outputs will be set as NaN. Default is 0.2. |
| 103 | + Numeric between 0 and 1. Lower threshold for the proportion of |
| 104 | + subjects with finite values. Used together with `num.subj.lthr.abs` |
| 105 | + (the effective threshold is the maximum of the two). Default is 0.2. |
109 | 106 |
|
110 | 107 | - verbose: |
111 | 108 |
|
112 | | - TRUE or FALSE, to print verbose message or not |
| 109 | + Logical. Print progress messages. Default `TRUE`. |
113 | 110 |
|
114 | 111 | - pbar: |
115 | 112 |
|
116 | | - TRUE or FALSE, to print progress bar or not |
| 113 | + Logical. Show progress bar. Default `TRUE`. |
117 | 114 |
|
118 | 115 | - n_cores: |
119 | 116 |
|
120 | | - Positive integer, The number of CPU cores to run with |
| 117 | + Positive integer. Number of CPU cores for parallel processing via |
| 118 | + [`mclapply`](https://rdrr.io/r/parallel/mclapply.html). Default is 1 |
| 119 | + (serial). |
121 | 120 |
|
122 | 121 | - on_error: |
123 | 122 |
|
124 | | - Character: one of "stop", "skip", or "debug". When an error occurs |
125 | | - while fitting an element, choose whether to stop, skip returning |
126 | | - all-NaN values for that element, or drop into \`browser()\` (if |
127 | | - interactive) then skip. Default: "stop". |
| 123 | + Character: one of `"stop"`, `"skip"`, or `"debug"`. When an error |
| 124 | + occurs fitting one element: `"stop"` halts execution; `"skip"` returns |
| 125 | + all-`NaN` for that element; `"debug"` drops into |
| 126 | + [`browser`](https://rdrr.io/r/base/browser.html) (if interactive) then |
| 127 | + skips. Default: `"stop"`. |
128 | 128 |
|
129 | 129 | - write_results_name: |
130 | 130 |
|
131 | | - Optional analysis name for incremental writes to |
132 | | - \`results/\<write_results_name\>/results_matrix\`. |
| 131 | + Optional character. If provided, results are incrementally written to |
| 132 | + `results/<write_results_name>/results_matrix` in the HDF5 file |
| 133 | + specified by `write_results_file`. |
133 | 134 |
|
134 | 135 | - write_results_file: |
135 | 136 |
|
136 | | - Optional HDF5 file path used when \`write_results_name\` is provided. |
| 137 | + Optional character. HDF5 file path for incremental result writes. |
| 138 | + Required when `write_results_name` is provided. |
137 | 139 |
|
138 | 140 | - write_results_flush_every: |
139 | 141 |
|
140 | | - Positive integer number of elements per write block. |
| 142 | + Positive integer. Number of elements per write block. Default 1000. |
141 | 143 |
|
142 | 144 | - write_results_storage_mode: |
143 | 145 |
|
144 | | - Storage mode for results writes (e.g., \`"double"\`). |
| 146 | + Character. Storage mode for HDF5 writes (e.g. `"double"`). Default |
| 147 | + `"double"`. |
145 | 148 |
|
146 | 149 | - write_results_compression_level: |
147 | 150 |
|
148 | | - Gzip compression level (0-9) for results writes. |
| 151 | + Integer 0–9. Gzip compression level for HDF5 writes. Default 4. |
149 | 152 |
|
150 | 153 | - return_output: |
151 | 154 |
|
152 | | - If TRUE (default), return the combined data.frame. If FALSE, returns |
153 | | - \`invisible(NULL)\`; useful for streaming large runs to HDF5. |
| 155 | + Logical. If `TRUE` (default), return the combined data.frame. If |
| 156 | + `FALSE`, return `invisible(NULL)`; useful when writing large outputs |
| 157 | + directly to HDF5. |
154 | 158 |
|
155 | 159 | - ...: |
156 | 160 |
|
157 | | - Additional arguments for \`stats::lm()\` |
| 161 | + Additional arguments passed to |
| 162 | + [`lm`](https://rdrr.io/r/stats/lm.html). |
158 | 163 |
|
159 | 164 | ## Value |
160 | 165 |
|
161 | | -Tibble with the summarized model statistics for all elements requested |
162 | | -when \`return_output = TRUE\`; otherwise \`invisible(NULL)\`. |
| 166 | +A tibble with one row per element. The first column is `element_id` |
| 167 | +(0-based). Remaining columns contain the requested statistics, named as |
| 168 | +`<term>.<statistic>` for per-term statistics and `model.<statistic>` for |
| 169 | +model-level statistics. If p-value corrections were requested, |
| 170 | +additional columns are appended with the correction method as suffix |
| 171 | +(e.g. `<term>.p.value.fdr`). |
163 | 172 |
|
164 | 173 | ## Details |
165 | 174 |
|
166 | 175 | You may request returning specific statistical variables by setting |
167 | | -`var.*`, or you can get all by setting `full.outputs=TRUE`. Note that |
| 176 | +`var.*`, or you can get all by setting `full.outputs = TRUE`. Note that |
168 | 177 | statistics covered by `full.outputs` or `var.*` are the ones from |
169 | | -broom::tidy() and broom::glance() only, and do not include corrected |
170 | | -p-values. However FDR-corrected p-values ("fdr") are generated by |
171 | | -default. List of acceptable statistic names for each of `var.*`: |
| 178 | +[`broom::tidy()`](https://generics.r-lib.org/reference/tidy.html), |
| 179 | +[`broom::glance()`](https://generics.r-lib.org/reference/glance.html) |
| 180 | +only, and do not include corrected p-values. However FDR-corrected |
| 181 | +p-values (`"fdr"`) are generated by default. |
172 | 182 |
|
173 | | -- `var.terms`: c("estimate","std.error","statistic","p.value"); For |
| 183 | +List of acceptable statistic names for each of `var.*`: |
| 184 | + |
| 185 | +- `var.terms`: `c("estimate", "std.error", "statistic", "p.value")`; For |
174 | 186 | interpretation please see |
175 | 187 | [tidy.lm](https://broom.tidymodels.org/reference/tidy.lm.html). |
176 | 188 |
|
177 | | -- `var.model`: c("r.squared", "adj.r.squared", "sigma", "statistic", |
178 | | - "p.value", "df", "logLik", "AIC", "BIC", "deviance", "df.residual", |
179 | | - "nobs"); For interpretation please see |
| 189 | +- `var.model`: |
| 190 | + `c("r.squared", "adj.r.squared", "sigma", "statistic", "p.value", "df", "logLik", "AIC", "BIC", "deviance", "df.residual", "nobs")`; |
| 191 | + For interpretation please see |
180 | 192 | [glance.lm](https://broom.tidymodels.org/reference/glance.lm.html). |
181 | 193 |
|
182 | 194 | For p-value corrections (arguments `correct.p.value.*`), supported |
183 | | -methods include all methods in \`p.adjust.methods\` except "none". Can |
184 | | -be more than one method. FDR-corrected p-values ("fdr") are calculated |
185 | | -by default. Turn it off by setting to "none". |
| 195 | +methods include all methods in `p.adjust.methods` except `"none"`. You |
| 196 | +can request more than one method. FDR-corrected p-values (`"fdr"`) are |
| 197 | +calculated by default. Turn it off by setting to `"none"`. |
| 198 | + |
186 | 199 | Arguments `num.subj.lthr.abs` and `num.subj.lthr.rel` are mainly for |
187 | 200 | input data with subject-specific masks, i.e. currently only for volume |
188 | 201 | data. For fixel-wise data, you may ignore these arguments. |
| 202 | + |
| 203 | +## See also |
| 204 | + |
| 205 | +[`ModelArray.gam`](https://pennlinc.github.io/ModelArray/reference/ModelArray.gam.md) |
| 206 | +for generalized additive models, |
| 207 | +[`ModelArray.wrap`](https://pennlinc.github.io/ModelArray/reference/ModelArray.wrap.md) |
| 208 | +for user-supplied functions, |
| 209 | +[ModelArray](https://pennlinc.github.io/ModelArray/reference/ModelArray-class.md) |
| 210 | +for the input class, |
| 211 | +[`ModelArray`](https://pennlinc.github.io/ModelArray/reference/ModelArray-class.html) |
| 212 | +for the constructor, |
| 213 | +[`exampleElementData`](https://pennlinc.github.io/ModelArray/reference/exampleElementData.md) |
| 214 | +for testing formulas on a single element. |
| 215 | + |
| 216 | +## Examples |
| 217 | + |
| 218 | +``` r |
| 219 | +if (FALSE) { # interactive() |
| 220 | +ma <- ModelArray("path/to/data.h5", scalar_types = c("FD")) |
| 221 | +phenotypes <- read.csv("cohort.csv") |
| 222 | + |
| 223 | +# Fit linear model with default outputs |
| 224 | +results <- ModelArray.lm( |
| 225 | + FD ~ age + sex, |
| 226 | + data = ma, |
| 227 | + phenotypes = phenotypes, |
| 228 | + scalar = "FD" |
| 229 | +) |
| 230 | +head(results) |
| 231 | + |
| 232 | +# Full outputs, no p-value correction |
| 233 | +results_full <- ModelArray.lm( |
| 234 | + FD ~ age + sex, |
| 235 | + data = ma, |
| 236 | + phenotypes = phenotypes, |
| 237 | + scalar = "FD", |
| 238 | + full.outputs = TRUE, |
| 239 | + correct.p.value.terms = "none", |
| 240 | + correct.p.value.model = "none" |
| 241 | +) |
| 242 | +} |
| 243 | +``` |
0 commit comments