惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

H
Help Net Security
T
ThreatConnect
SecWiki News
SecWiki News
F
Future of Privacy Forum
AWS News Blog
AWS News Blog
C
Cisco Blogs
A
Arctic Wolf
Vercel News
Vercel News
The GitHub Blog
The GitHub Blog
Scott Helme
Scott Helme
V
V2EX
博客园 - 叶小钗
阮一峰的网络日志
阮一峰的网络日志
K
Kaspersky official blog
G
Google Developers Blog
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
P
Privacy International News Feed
C
Cyber Attacks, Cyber Crime and Cyber Security
N
News | PayPal Newsroom
Schneier on Security
Schneier on Security
NISL@THU
NISL@THU
Microsoft Azure Blog
Microsoft Azure Blog
量子位
The Hacker News
The Hacker News
Stack Overflow Blog
Stack Overflow Blog
Security Latest
Security Latest
M
Microsoft Research Blog - Microsoft Research
Google Online Security Blog
Google Online Security Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
I
InfoQ
Google DeepMind News
Google DeepMind News
Y
Y Combinator Blog
The Cloudflare Blog
Microsoft Security Blog
Microsoft Security Blog
Martin Fowler
Martin Fowler
Cisco Talos Blog
Cisco Talos Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Troy Hunt's Blog
F
Fox-IT International blog
S
Security @ Cisco Blogs
博客园 - 司徒正美
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
C
Comments on: Blog
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
L
LINUX DO - 最新话题
GbyAI
GbyAI
Project Zero
Project Zero
腾讯CDC
T
Tailwind CSS Blog

stat updates on arXiv.org

Riemannian Archetypal Analysis: Interpretable non-linear data analysis on deformed star distributions Repeated Sequences Reveal Gaps between Large Language Models and Natural Language Characterizing the Representational Capacity of Neural Processes A lift for input-convex neural network training Private Adaptive Covariance Estimation via Gaussian Graphical Models CurveRL: Principled Distribution-Aware Context Reweighting for LLM Reasoning Assessing the Operational Viability of Foundation Models for Time Series Forecasting The Normalized Maximum Likelihood for Regular Non-Smooth Models: Measure-Theoretic Foundations and Geometric Sampling An Effective-Rank Audit of Alignment-Induced Activation Shifts: Confound Control, Constructive Calibration, and Limits Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit Cross-Domain Energy-Guided Diffusion Generation for Off-Dynamics Reinforcement Learning Quaternion Self-Attention with Shared Scores Spiking the training data to correct for test set contamination Efficient Benchmarking Is Just Feature Selection and Multiple Regression Optimal Non-Asymptotic Edgeworth Expansions for Multivariate Neural Network Outputs Causality as the Statistical Conscience of Artificial Intelligence: From Pearl's Ladder to Trustworthy Machines Detecting Metastable Basins in High Dimensions via Marginal Trajectory Distribution Discrimination Distributionally Robust Transfer Learning with Structurally Missing Covariates, with Application to Cross-National Cardiac Arrest Prediction MEDAL: Manifold Embedding Distillation via Autoencoder Learning Multicalibration Boosting: Theory, Convergence, and Transferability Clustering based on Stochastic Dominance with application for risk averters and risk seekers Physen-Noise2Noise: Physics-Guided Self-Supervised Defocus Deblurring with Bias Correction under Low-Light Conditions Affinity Graph Connectivity in Convex Clustering On the Sample Complexity of Robust Binary Hypothesis Testing How Neural Reward Models Learn Features for Policy Optimization: A Single-Index Analysis Estimating Mixture Distributions via Stochastic Mirror Descent Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy Counterfactually Safe Reinforcement Learning Rejoinder: The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review Nyström Kernel Stein Discrepancy Tests Choosing Online Experiment Designs under Interference in Ads, Recommendations, and Member-Experience Systems Learning manifold diffusion semigroups from graph transition matrices Different Statistical Perspectives for Understanding Generalisation in Graph Neural Networks Mean-Shift PCA by Knockoff Mean Guided Flow Matching for Forward and Inverse PDE Problems with Sparse Observations: Algorithm and Theory From DPPs to $k$-DPPs: identifiability analysis via spectral decomposition Rao-Blackwellized Score Matching on Manifolds Nonstationary Generalized Linear Bandits with Discounted Online Mirror Descent Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification Learning Sparse Compositional Functions with Norm-Constrained Neural Networks StrTransformer: Source-Wise Structured Transformers for Unsupervised Blind Source Recovery PAC Learning with Bandit Feedback: Sharp Sample Complexity in the Realizable Setting Geometry Adaptive Counterfactual Distribution Learning with Diffusion-Guided Smoothing Minimax Limits of k-Fold Cross-Validation via Majority Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking Possession-Level Player Impact in the Pre-Play-by-Play NBA Era: A Video-Reconstructed RAPM Database, 1984--1996 Convergence and non-asymptotic error analysis for kinetic Langevin samplers using the exact harmonic Langevin integrator PCA score regression: the art of losing power Heritability: A Counterfactual Perspective GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer Long Memory in Intrinsically Dynamic Factor Models Modified treatment policies that depend on the natural history of treatment Post-Processing Posterior Predictive P-values Scalable Gaussian Process for Learning Non-Ergodic Ground Motion Model from Physics-Based Simulations with Application to Power Infrastructure Assessment Using the target trial framework for combining information: external comparator analyses and other applications Trustworthy AI/ML Regression and Unbiased Causal Inference for Real-World Data Synthetic Heterogeneous-Effects LASSO: A Fixed-effects Estimation Approach for High-dimensional Mixed-effects Models Bayesian Conformal-Projective Prediction Shared hidden-factor information framework for multiple behavioral tasks Consistent Identification of Top-$K$ Nodes in Noisy Networks Adaptable High-Dimensional Change Point Detection via Ridge Regularization Logistic regression is not enough: The need for Bayesian nonparametric modelling for causal inference using observational data, exemplified by the 'gateway' effect Distributional Conformal Prediction for Markov Processes How Eviction Court Governs: A Statistical Analysis of Bargaining, Templates, and Debt in Philadelphia Deep Regression for Repeated Measurements under Covariate Shift Optimal Estimation of Discrete Multiview Distributions under Heteroskedastic Multinomial Sampling Information-Theoretic Reliability is Robust to Analytic Choice: A 24-Specification Multiverse on Public Cognitive Test-Retest Data Shared Keyboard: An improved Bayesian design for phase I clinical trials via Beta kernel process Kernel Embedding for Operator-Valued Measures and Its Application to Quantum Tomography A Statistical Physics View of the S&P 500: Pairwise Interactions and Time-Varying Dynamics A Quasi Maximum Likelihood Estimation Method for Bergomi-Type Volatility Models Rank-Based Tests for Mutual Independence of High-Dimensional Random Vectors via $L_q$ Norm Transcripts and Algebraic Distances in Time Series: Stochastic Properties and Nonparametric Dependence Tests Estimation of Directed Acyclic Graphs by Frequentist Model Averaging Exponential mixing properties of nonlinear functional autoregressive models Confidence intervals for causal effects in sequential decision making Stein-Encoder: A White-Box Supervised Encoder via Stein Identities in Multi-Modal Studies Measuring multivariate maximal tail dependence Matrix concentration inequalities for time-inhomogeneous Markov chains A Post-Processing Conformal Prediction Approach for Conditional Coverage via Pivotal Scores High-Dimensional Change-Point Detection via Angular Kernel Statistics Bayesian perspectives on exponential random graph models Nonparametric Estimation via Expected Order Statistics Weighted NPMLE for the Marginal Mean of Recurrent Events with a Competing Terminal Event Quantile autoregressive moving average models for ratio-based bounded time series A Statistical Framework for Model Selection in LSTM Networks Remote sensing data imputation using deep learning for multispectral imagery More Skills, Worse Agents? Skill Shadowing Degrades Performance When Expanding Skill Libraries Contested Temporalities in Critical Minerals and Resource Extraction for Electric Vehicles Memory, Roughness, and Information Persistence in Financial Markets: A Structural Approach to Volatility Forecasting Quadratically Regularized Optimal Transport: Localization Bounds and Affine Case Analysis Finding Koopman Invariant Subspaces via Personalized PageRank HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming Approximate Machine Unlearning through Manifold Representation Forgetting Guided by Self Mode Connectivity Human-Centered Learning Mechanics: A Dynamical Framework for Entropy-Regulated Representation Learning Anytime Training with Schedule-Free Spectral Optimization Robust OT-Guided Generative Residual Domain Adaptation for Bike-Sharing Demand Prediction under Temporal Domain Shift Any-Dimensional Invariant Universality
Considering causality in the construction of molecular signatures of lifestyle exposures
Diana Wu, Vi · 2026-05-26 · via stat updates on arXiv.org

View PDF HTML (experimental)

Abstract:Molecular signatures derived from omics data are increasingly used in epidemiological studies to characterize lifestyle exposures, either as proxies of exposure or to provide insight into disease mechanisms. These signatures are typically constructed by regressing the exposure on high-dimensional omics features. In the literature, an initial univariate screening step has sometimes been applied prior to multivariate modelling, but the causal implications of this choice have not yet been considered. Focusing on settings where the exposure causally influences molecular features (and not the reverse), we use directed acyclic graphs (DAGs) and $d$-separation arguments to show that collider bias may arise when the screening step is ignored, leading to the inclusion of non-causal features in the signature. We further demonstrate that the screening step can mitigate this bias. Our simulation studies illustrate that screening reduces the inclusion of non-causal features, albeit at the cost of lower sensitivity and reduced correlation between the exposure and the resulting signature. Overall, we recommend applying univariate screening prior to signature construction, particularly when the inclusion of non-causal features is undesirable, such as in mechanistic studies.
Comments: 28 pages, 10 figures
Subjects: Methodology (stat.ME)
MSC classes: 62P10
Cite as: arXiv:2605.26023 [stat.ME]
  (or arXiv:2605.26023v1 [stat.ME] for this version)
  https://doi.org/10.48550/arXiv.2605.26023

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Vivian Viallon [view email]
[v1] Mon, 25 May 2026 16:44:52 UTC (2,611 KB)