Passer au contenu

/ Département de mathématiques et de statistique

Je donne

Rechercher

Our graduate students

Adjogou, Adjobo Folly Dzigbodi

Vcard

Faculty of Arts and Science - Department of Mathematics and Statistics

André-Aisenstadt

Courriels

Courses

  • STT1682 H - Progiciels statistiq/actuariat
  • STT1682 H - Progiciels statistiq/actuariat

Research area

Student supervision Expand all Collapse all

Analyse statistique de données fonctionnelles à structures complexes Theses and supervised dissertations / 2017-05
Adjogou, Adjobo Folly Dzigbodi
Abstract
Longitudinal studies play a salient role in many and various research areas and their relevance is still increasing. The related methods have become a privileged tool for analyzing the evolution of a given phenomenon across time. Longitudinal data arise when measurements for one or more variables are taken at different points of a temporal axis on individuals involved in the study. A key feature of such type of data is that observations within the same subject may be correlated. That fundamental characteristic makes longitudinal data different from other types of data in statistics and motivates specific methodologies. There has been remarkable developments in that field in the past forty years. Typical analysis of longitudinal data relies on parametric, non-parametric or semi-parametric models. However, an important question widely addressed in the analysis of longitudinal data is related to cluster analysis and concerns the existence of groups or clusters (or homogeneous trajectories), suggested by the data, not defined a priori, such that individuals in a given cluster tend to be similar to each other in some sense, and individuals in different clusters tend to be dissimilar. This thesis aims at contributing to that rapidly expanding field of clustering longitudinal data. Indeed, an emerging non-parametric methodology for modeling longitudinal data is based on the functional data analysis approach in which longitudinal trajectories are viewed as a sample of partially observed functions or curves on some interval where these functions are often assumed to be smooth. We then propose in the present thesis, a succinct review of the most commonly used methods to analyze and cluster longitudinal data and two new model-based functional clustering methods. Indeed, we review most of the typical longitudinal data analysis models ranging from the parametric models to the semi and non parametric ones, as well as the recent developments in longitudinal cluster analysis according to the two main approaches : non-parametric and model-based. The purpose of that review is to provide a concise, broad and readily accessible overview of longitudinal data analysis and clustering methods. In the first method developed in this thesis, we use the functional data analysis approach to propose a very flexible model which combines functional principal components analysis and clustering to deal with any type of longitudinal data, even if the observations are sparse, irregularly spaced or occur at different time points for each individual. The functional modeling is based on splines and the main data groups are modeled as arising from clusters in the space of spline coefficients. The model, based on a mixture of Student’s t-distributions, is embedded into a Bayesian framework in which maximum a posteriori estimators are found with the EM algorithm. We develop an approximation of the marginal log-likelihood (MLL) that allows us to perform an MLL based model selection and that compares favourably with other popular criteria such as AIC and BIC. In the second method, we propose a new time-course or longitudinal data analysis framework that aims at combining functional model-based clustering and the Lasso penalization to identify groups of individuals with similar patterns. An EM algorithm-based approach is used on a functional modeling where the individual curves are approximated into a space spanned by a finite basis of B-splines and the number of clusters is determined by penalizing a mixture of Student’s t-distributions with unknown degrees of freedom. The Latin Hypercube Sampling is used to efficiently explore the space of penalization parameters. For both methodologies, the estimation of the parameters is based on the iterative expectation-maximization (EM) algorithm.