





















Abstract:Molecular signatures derived from omics data are increasingly used in epidemiological studies to characterize lifestyle exposures, either as proxies of exposure or to provide insight into disease mechanisms. These signatures are typically constructed by regressing the exposure on high-dimensional omics features. In the literature, an initial univariate screening step has sometimes been applied prior to multivariate modelling, but the causal implications of this choice have not yet been considered. Focusing on settings where the exposure causally influences molecular features (and not the reverse), we use directed acyclic graphs (DAGs) and $d$-separation arguments to show that collider bias may arise when the screening step is ignored, leading to the inclusion of non-causal features in the signature. We further demonstrate that the screening step can mitigate this bias. Our simulation studies illustrate that screening reduces the inclusion of non-causal features, albeit at the cost of lower sensitivity and reduced correlation between the exposure and the resulting signature. Overall, we recommend applying univariate screening prior to signature construction, particularly when the inclusion of non-causal features is undesirable, such as in mechanistic studies.
| Comments: | 28 pages, 10 figures |
| Subjects: | Methodology (stat.ME) |
| MSC classes: | 62P10 |
| Cite as: | arXiv:2605.26023 [stat.ME] |
| (or arXiv:2605.26023v1 [stat.ME] for this version) | |
| https://doi.org/10.48550/arXiv.2605.26023 arXiv-issued DOI via DataCite (pending registration) |
From: Vivian Viallon [view email]
[v1]
Mon, 25 May 2026 16:44:52 UTC (2,611 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。