Date : Lundi 20 janvier 2025
Heure : 10h30
Salle : 6214, pavillon André-Aisenstadt
Conférencière: Mila Shuo Sun, Postdoctoral Research Fellow, Harvard,
Titre : Double-Sampling for Informatively Missing Data in Treatment Effect Estimation
Résumé : Missing or incomplete data is a widespread challenge in observational studies, especially
when the data at hand were not originally collected for research purposes, as is often the case with electronic health records (EHRs). Furthermore, these data may be particularly susceptible to the outcome data being missing-not-at-random (MNAR). To mitigate bias due to MNAR data, I propose to use a double-sampling strategy, through which the otherwise missing data are ascertained on a sub-sample of study units. The statistical objectives are the estimation and inference of two causal effects: the weighted quantile treatment effects (WQTEs) and the average treatment effects (ATEs). The WQTEs, in particular, provide a complement to standard mean-focused causal contrasts, especially when interest lies at the tails of the counterfactual distribution.
With the additional data, I present identifying conditions that do not require missingness as- sumptions in the original data. For the WQTEs, I propose a novel inverse-probability weighted estimator and derive its asymptotic properties, both pointwise at specific quantiles and uni- formly across quantiles over some compact subset of (0, 1), allowing the propensity score and double-sampling probabilities to be estimated. For practical inference, I develop a bootstrap method that can be used for both pointwise and uniform inference. A simulation study is conducted to examine the finite sample performance of the proposed estimators. I illustrate the proposed method using EHR data examining the relative effects of two bariatric surgery procedures on BMI loss three years post-surgery.
Since the double-sampling strategy can be planned from the beginning, it provides an op- portunity to allocate resources effectively within a fixed budget. Motivated by this, I derive the optimal sampling rule that minimizes semiparametric efficiency bound, subject to a bud- get constraint. The optimal double-sampling rules generally depend on the unknown full data distribution. To address this, I conduct a pilot study to estimate unknown quantities and in- vestigate asymptotic properties, using ATEs as an example, considering both fixed pilot sample sizes and cases where the sample size approaches zero at a specific rate. Two simulation studies, assuming Hˇolder smooth functions and sparsity functions, respectively, verify the efficiency of the propose optimal sampling rules in finite samples.
The proposed double-sampling strategy will provide researchers with an alternative to the dominant sensitivity analysis-based paradigm for informatively missing data.