Vicente, Sergio

Faculty of Arts and Science - Department of Mathematics and Statistics

André-Aisenstadt

You must click to get the email

Courses

STT3220 E - Méthodes de prévision

Research area

Student supervision Expand all Collapse all

Apprentissage statistique avec le processus ponctuel déterminantal Theses and supervised dissertations / 2021-02

Vicente, Sergio
Abstract

This thesis presents the determinantal point process, a probabilistic model that captures repulsion between points of a certain space. This repulsion is encompassed by a similarity matrix, the kernel matrix, which selects which points are more similar and then less likely to appear in the same subset. This point process gives more weight to subsets characterized by a larger diversity of its elements, which is not the case with the traditional uniform random sampling. Diversity has become a key concept in domains such as medicine, sociology, forensic sciences and behavioral sciences. The determinantal point process is considered a promising alternative to traditional sampling methods, since it takes into account the diversity of selected elements. It is already actively used in machine learning as a subset selection method. Its application in statistics is illustrated with three papers. The first paper presents the consensus clustering, which consists in running a clustering algorithm on the same data, a large number of times. To sample the initials points of the algorithm, we propose the determinantal point process as a sampling method instead of a uniform random sampling and show that the former option produces better clustering results. The second paper extends the methodology developed in the first paper to large-data. Such datasets impose a computational burden since sampling with the determinantal point process is based on the spectral decomposition of the large kernel matrix. We introduce two methods to deal with this issue. These methods also produce better clustering results than consensus clustering based on a uniform sampling of initial points. The third paper addresses the problem of variable selection for the linear model and the logistic regression, when the number of predictors is large. A Bayesian approach is adopted, using Markov Chain Monte Carlo methods with Metropolis-Hasting algorithm. We show that setting the determinantal point process as the prior distribution for the model space selects a better final model than the model selected by a uniform prior on the model space.

Link to the Papyrus's document

Université de Montréal / Faculté des arts et des sciences Département de mathématiques et de statistique

Vicente, Sergio

Courses

Research area

Student supervision Expand all Collapse all

Our researchers by theme

Supporting the Department?

NEED HELP?

FACULTY OF ARTS AND SCIENCE

Vicente, Sergio

Courriels

Courses

Research area

Student supervision Expand all Collapse all

Our researchers by theme