





















Abstract:This paper studies variable selection and post-selection inference for high-dimensional clustered data using marginal-model-based procedures. We show that, when covariates are heterogeneously distributed across clusters, marginal-model LASSO may use them as sparse proxies for latent cluster effects, shifting the estimation target away from the structural fixed effects and inducing false selections. To address this problem, we propose Synthetic Heterogeneous-Effects LASSO (SHEL), a fixed-effects penalized framework that incorporates cluster-level synthetic approximations to the latent heterogeneity. We establish theoretical properties of SHEL in high-dimensional settings and develop procedures for valid post-selection inference. The finite sample performance of the proposed method is investigated through extensive simulation studies. A longitudinal bulk RNA-seq dataset of enriched blood neutrophils from hospitalized COVID-19 patients is analyzed to demonstrate the method in a real application.
| Subjects: | Methodology (stat.ME) |
| Cite as: | arXiv:2605.24587 [stat.ME] |
| (or arXiv:2605.24587v1 [stat.ME] for this version) | |
| https://doi.org/10.48550/arXiv.2605.24587 arXiv-issued DOI via DataCite (pending registration) |
From: Shangyuan Ye [view email]
[v1]
Sat, 23 May 2026 13:57:27 UTC (193 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。