


















Abstract:Two-phase sampling is a simple and cost-effective estimation strategy in survey sampling and is widely used in practice. Because the phase-2 sampling probability typically depends on low-cost variables collected at phase 1, naive estimation based solely on the phase-2 sample generally results in biased inference. This issue arises even when estimating causal parameters such as the average treatment effect (ATE), and there has been growing interest in recent years in the proper estimation of such parameters under complex sampling designs (e.g., Nattino et al., 2025). In this paper, we derive the semiparametric efficiency bound for a broad class of weighted average treatment effects (WATE), which includes the ATE, the average treatment effect on the treated/untreated (ATT/ATU), and the average treatment effect on the overlapped population (ATO), under two-phase sampling. In addition to straightforward weighting estimators based on the sampling probabilities, we also propose estimators that can attain strictly higher efficiency under suitable conditions. In particular, under outcome-dependent sampling, we show that substantial efficiency gains can be achieved by appropriately incorporating phase-1 information. We further conduct extensive simulation studies, varying the choice of phase-1 variables and sampling schemes, to characterize when and to what extent leveraging phase-1 information leads to efficiency gains.
From: Kazuharu Harada [view email]
[v1]
Mon, 10 Nov 2025 07:50:00 UTC (224 KB)
[v2]
Sun, 21 Jun 2026 02:39:15 UTC (234 KB)
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。