Feature Selection for Linear Fixed Effects Models

Jim Burke
University of Washington

Linear mixed-effects (LME) models are used to analyze nested or combined data across a range of groups or clusters. These models use covariates to separate the total population variability (the fixed effects) from the group variability (the random effects). LMEs borrow strength across groups to estimate key statistics in cases where the data within groups may be sparse or highly variable, and play a fundamental role in population health sciences, meta-analysis, life sciences, and in many others domains. In this talk we formally introduce a mathematical description of the LME model and its feature selection variant. A naive proximal gradient descent (PGD) algorithm for its solution is described and its deficiencies are explained. A novel solution strategy is proposed that is based on relaxation strategy that decouples the smooth from the nonsmooth components of the maximum likelihood objective. An optimal value function is obtained by partially optimizing the smooth component of the decoupled problem. We show that the resulting optimal value function has a locally Lipschitz gradient and so a PGD algorithm can be applied to a feature selecting regularization of the optimal value function. At first this approach seems counter intuitive since the optimal value function adds yet another layer of complexity to the problem. However, this complexity is mitigated by the use of modern variational and numerical techniques. The resulting PGD algorithm applied to this reformulation is more stable and can rapidly identify the important features to high accuracy. The algorithmic details and the numerical results are presented.