Passer au contenu

/ Département de mathématiques et de statistique

Je donne

Rechercher

 

Murua, Alejandro

Vcard

Full Professor

Faculty of Arts and Science - Department of Mathematics and Statistics

André-Aisenstadt Office 4221

514 343-6987

Courriels

Affiliations

  • Membre Centre de recherches mathématiques
  • Membre CRM — Centre de recherches mathématiques
  • Membre Institut de valorisation des données
  • Membre IVADO — Institut de valorisation des données

Courses

  • STT3260 A - Modèles de survie
  • STT6516 A - Données categorielles

Research area

Student supervision Expand all Collapse all

Apprentissage basé sur le Qini pour la prédiction de l'effet causal conditionnel Theses and supervised dissertations / 2021-08
Belbahri, Mouloud-Beallah
Abstract
Uplift models deal with cause-and-effect inference for a specific factor, such as a marketing intervention. In practice, these models are built on individual data from randomized experiments. A targeted group contains individuals who are subject to an action; a control group serves for comparison. Uplift modeling is used to order the individuals with respect to the value of a causal effect, e.g., positive, neutral, or negative. First, we propose a new way to perform model selection in uplift regression models. Our methodology is based on the maximization of the Qini coefficient. Because model selection corresponds to variable selection, the task is haunting and intractable if done in a straightforward manner when the number of variables to consider is large. To realistically search for a good model, we conceived a searching method based on an efficient exploration of the regression coefficients space combined with a lasso penalization of the log-likelihood. There is no explicit analytical expression for the Qini surface, so unveiling it is not easy. Our idea is to gradually uncover the Qini surface in a manner inspired by surface response designs. The goal is to find a reasonable local maximum of the Qini by exploring the surface near optimal values of the penalized coefficients. We openly share our codes through the R Package tools4uplift. Though there are some computational methods available for uplift modeling, most of them exclude statistical regression models. Our package intends to fill this gap. This package comprises tools for: i) quantization, ii) visualization, iii) variable selection, iv) parameters estimation and v) model validation. This library allows practitioners to use our methods with ease and to refer to methodological papers in order to read the details. Uplift is a particular case of causal inference. Causal inference tries to answer questions such as ``What would be the result if we gave this patient treatment A instead of treatment B?" . The answer to this question is then used as a prediction for a new patient. In the second part of the thesis, it is on the prediction that we have placed more emphasis. Most existing approaches are adaptations of random forests for the uplift case. Several split criteria have been proposed in the literature, all relying on maximizing heterogeneity. However, in practice, these approaches are prone to overfitting. In this work, we bring a new vision to uplift modeling. We propose a new loss function defined by leveraging a connection with the Bayesian interpretation of the relative risk. Our solution is developed for a specific twin neural network architecture allowing to jointly optimize the marginal probabilities of success for treated and control individuals. We show that this model is a generalization of the uplift logistic interaction model. We modify the stochastic gradient descent algorithm to allow for structured sparse solutions. This helps fitting our uplift models to a great extent. We openly share our Python codes for practitioners wishing to use our algorithms. We had the rare opportunity to collaborate with industry to get access to data from large-scale marketing campaigns favorable to the application of our methods. We show empirically that our methods are competitive with the state of the art on real data and through several simulation setting scenarios.

Apprentissage statistique avec le processus ponctuel déterminantal Theses and supervised dissertations / 2021-02
Vicente, Sergio
Abstract
This thesis presents the determinantal point process, a probabilistic model that captures repulsion between points of a certain space. This repulsion is encompassed by a similarity matrix, the kernel matrix, which selects which points are more similar and then less likely to appear in the same subset. This point process gives more weight to subsets characterized by a larger diversity of its elements, which is not the case with the traditional uniform random sampling. Diversity has become a key concept in domains such as medicine, sociology, forensic sciences and behavioral sciences. The determinantal point process is considered a promising alternative to traditional sampling methods, since it takes into account the diversity of selected elements. It is already actively used in machine learning as a subset selection method. Its application in statistics is illustrated with three papers. The first paper presents the consensus clustering, which consists in running a clustering algorithm on the same data, a large number of times. To sample the initials points of the algorithm, we propose the determinantal point process as a sampling method instead of a uniform random sampling and show that the former option produces better clustering results. The second paper extends the methodology developed in the first paper to large-data. Such datasets impose a computational burden since sampling with the determinantal point process is based on the spectral decomposition of the large kernel matrix. We introduce two methods to deal with this issue. These methods also produce better clustering results than consensus clustering based on a uniform sampling of initial points. The third paper addresses the problem of variable selection for the linear model and the logistic regression, when the number of predictors is large. A Bayesian approach is adopted, using Markov Chain Monte Carlo methods with Metropolis-Hasting algorithm. We show that setting the determinantal point process as the prior distribution for the model space selects a better final model than the model selected by a uniform prior on the model space.

Application des méthodes de partitionnement de données fonctionnelles aux trajectoires de voiture Theses and supervised dissertations / 2020-08
Paul, Alexandre
Abstract
The study of the clustering of functional data has made a lot of progress in the last couple of years. Multiple methods have been proposed and the respective analysis has shown their eÿciency with some benchmark studies. The objective of this Master’s thesis is to compare those clustering algorithms with datasets from traÿc at an intersection of Montreal. The idea behind this is that the manual classification of these data sets is time-consuming. We show that it is possible to obtain adequate clustering and prediction results with several algorithms. One of the methods that we discussed is distclust : a distance-based algorithm that uses a K-means approach. We will also use a Gaussian mixture density clustering method known as mclust. Although those two techniques are quite e˙ective, they are multi-purpose clustering methods, therefore not tailored to the functional case. With that in mind, we apply four functional clustering methods : fitfclust, funmbclust, funclust, and funHDDC. Our results show that there is no loss in the quality of the clustering between the afore-mentioned functional methods and the multi-purpose ones. We prefer to use the functional ones because they provide a detailed estimation of the functional structure of the trajectory curves. One notable detail is the impact of a dimension reduction done with multivari-ate functional principal components analysis. Furthermore, we can use objective selection criteria such as the AIC and the BIC, and avoid using cluster quality indices that use a pre-existing classification of the data.

Modèle de mélange gaussien à effets superposés pour l'identification de sous-types de schizophrénie Theses and supervised dissertations / 2020-03
Nefkha-Bahri, Samy
Abstract
This work is part of the research effort to identify subtypes of schizophrenia through brain connectivity data from functional magnetic resonance imaging. Clustering techniques, including the Esperance-Maximization algorithm (EM) for estimating parameters of Gaussian mixture models, have been used on such data in previous research. This approach captures the effects of normal brain processes that are irrelevant to the identification of disease subtypes. In this work, the population data of control (non-disease) individuals are modeled by a finite mixture of Gaussian densities. Each density represents an assumed subtype of normal brain function. A new model is proposed for the population data of affected individuals : a mixture of Gaussian densities where each density has an mean corresponding to the sum of a normal state and a disease state. Therefore, it is a mixture in which subtypes of normal brain function and subtypes of disease are superimposed. It is assumed that normal and unhealthy processes are additive and the goal is to isolate and estimate the unhealthy effects. An EM algorithm specifically designed for this model is developed. Data were obtained from functional magnetic resonance imaging of 242 control individuals and 242 patients diagnosed with schizophrenia. Results obtained using this algorithm on this data set are reported.

Régression de Cox avec partitions latentes issues du modèle de Potts Theses and supervised dissertations / 2019-07
Martínez Vargas, Danae Mirel
Abstract
Le but de ce projet de recherche est de développer un modèle de régression bayésien non paramétrique issu de partitions aléatoires dans un contexte d’analyse de survie. Notre objectif final est de construire un système de prévision qui dans un premier temps consiste à regrouper les observations ayant des caractéristiques semblables. Une fois les sous-groupes formés, la survie au sein de chaque sous-groupe est évaluée à l’aide d’un modèle bayésien non paramétrique. Le nombre de sous-groupes dans la population est aléatoire. Nous proposons l’utilisation du modèle de classification de Potts (Murua, Stanberry et Stuetzle [29]) appliqué à l’espace des covariables afin de générer la formation des partitions aléatoires d’individus. Pour toute partition donnée, le modèle proposé dans ce projet suppose une régression de Cox par intervalles avec taux de risque de base Weibull au sein de chaque grappe. Cette méthodologie a été inspiré du travail de Ibrahim [18]. Les estimations et l’inférence sont effectuées à l’aide de méthodes MCMC. Nous utilisons également la méthode d’approximation de Laplace (Shun et McCullagh [36]) pour estimer certaines constantes et proposer des mises à jour de paramètres dans l’application de l’algorithme MCMC. Finalement, nous comparons les performances de notre modèle à celles d’une régression de Cox classique et au modèle bayésien non paramétrique de partition-produit indexé par des covariables, PPMx tout ceci à l’aide de nombreuses simulations. En général, notre modèle a offert des résultats comparables à ceux de ses compétiteurs et s’est avéré dans certains cas le meilleur choix.

Analyse statistique de données fonctionnelles à structures complexes Theses and supervised dissertations / 2017-05
Adjogou, Adjobo Folly Dzigbodi
Abstract
Longitudinal studies play a salient role in many and various research areas and their relevance is still increasing. The related methods have become a privileged tool for analyzing the evolution of a given phenomenon across time. Longitudinal data arise when measurements for one or more variables are taken at different points of a temporal axis on individuals involved in the study. A key feature of such type of data is that observations within the same subject may be correlated. That fundamental characteristic makes longitudinal data different from other types of data in statistics and motivates specific methodologies. There has been remarkable developments in that field in the past forty years. Typical analysis of longitudinal data relies on parametric, non-parametric or semi-parametric models. However, an important question widely addressed in the analysis of longitudinal data is related to cluster analysis and concerns the existence of groups or clusters (or homogeneous trajectories), suggested by the data, not defined a priori, such that individuals in a given cluster tend to be similar to each other in some sense, and individuals in different clusters tend to be dissimilar. This thesis aims at contributing to that rapidly expanding field of clustering longitudinal data. Indeed, an emerging non-parametric methodology for modeling longitudinal data is based on the functional data analysis approach in which longitudinal trajectories are viewed as a sample of partially observed functions or curves on some interval where these functions are often assumed to be smooth. We then propose in the present thesis, a succinct review of the most commonly used methods to analyze and cluster longitudinal data and two new model-based functional clustering methods. Indeed, we review most of the typical longitudinal data analysis models ranging from the parametric models to the semi and non parametric ones, as well as the recent developments in longitudinal cluster analysis according to the two main approaches : non-parametric and model-based. The purpose of that review is to provide a concise, broad and readily accessible overview of longitudinal data analysis and clustering methods. In the first method developed in this thesis, we use the functional data analysis approach to propose a very flexible model which combines functional principal components analysis and clustering to deal with any type of longitudinal data, even if the observations are sparse, irregularly spaced or occur at different time points for each individual. The functional modeling is based on splines and the main data groups are modeled as arising from clusters in the space of spline coefficients. The model, based on a mixture of Student’s t-distributions, is embedded into a Bayesian framework in which maximum a posteriori estimators are found with the EM algorithm. We develop an approximation of the marginal log-likelihood (MLL) that allows us to perform an MLL based model selection and that compares favourably with other popular criteria such as AIC and BIC. In the second method, we propose a new time-course or longitudinal data analysis framework that aims at combining functional model-based clustering and the Lasso penalization to identify groups of individuals with similar patterns. An EM algorithm-based approach is used on a functional modeling where the individual curves are approximated into a space spanned by a finite basis of B-splines and the number of clusters is determined by penalizing a mixture of Student’s t-distributions with unknown degrees of freedom. The Latin Hypercube Sampling is used to efficiently explore the space of penalization parameters. For both methodologies, the estimation of the parameters is based on the iterative expectation-maximization (EM) algorithm.

Modélisation des bi-grappes et sélection des variables pour des données de grande dimension : application aux données d'expression génétique Theses and supervised dissertations / 2012-08
Chekouo Tekougang, Thierry
Abstract
Clustering is a classical method to analyse gene expression data. When applied to the rows (e.g. genes), each column belongs to all clusters. However, it is often observed that the genes of a subset of genes are co-regulated and co-expressed in a subset of conditions, but behave almost independently under other conditions. For these reasons, biclustering techniques have been proposed to look for sub-matrices of a data matrix. Biclustering is a simultaneous clustering of rows and columns of a data matrix. Most of the biclustering algorithms proposed in the literature have no statistical foundation. It is interesting to pay attention to the underlying models of these algorithms and develop statistical models to obtain significant biclusters. In this thesis, we review some biclustering algorithms that seem to be most popular. We group these algorithms in accordance to the type of homogeneity in the bicluster and the type of overlapping that may be encountered. We shed light on statistical models that can justify these algorithms. It turns out that some techniques can be justified in a Bayesian framework. We develop an extension of the biclustering plaid model in a Bayesian framework and we propose a measure of complexity for biclustering. The deviance information criterion (DIC) is used to select the number of biclusters. Studies on gene expression data and simulated data give satisfactory results. To our knowledge, the biclustering algorithms assume that genes and experimental conditions are independent entities. These algorithms do not incorporate prior biological information that could be available on genes and conditions. We introduce a new Bayesian plaid model for gene expression data which integrates biological knowledge and takes into account the pairwise interactions between genes and between conditions via a Gibbs field. Dependence between these entities is made from relational graphs, one for genes and another for conditions. The graph of the genes and conditions is constructed by the k-nearest neighbors and allows to define a priori distribution of labels as auto-logistic models. The similarities of genes are calculated using gene ontology (GO). To estimate the parameters, we adopt a hybrid procedure that mixes MCMC with a variant of the Wang-Landau algorithm. Experiments on simulated and real data show the performance of our approach. It should be noted that there may be several variables of noise in microarray data. These variables may mask the true structure of the clustering. Inspired by the plaid model, we propose a model that simultaneously finds the true clustering structure and identifies discriminating variables. We propose a new model to solve the problem. It assumes that an observation can be explained by more than one cluster. This problem is addressed by using a binary latent vector, so the estimation is obtained via the Monte Carlo EM algorithm. Importance Sampling is used to reduce the computational cost of the Monte Carlo sampling at each step of the EM algorithm. Numerical examples demonstrate the usefulness of these methods in terms of variable selection and clustering.

Approximation de la distribution a posteriori d'un modèle Gamma-Poisson hiérarchique à effets mixtes Theses and supervised dissertations / 2011-01
Nembot Simo, Annick Joëlle
Abstract
We propose a method for analysing count or Poisson data based on the procedure called Poisson Regression Interactive Multilevel Modeling (PRIMM) introduced by Christiansen and Morris (1997). The Poisson regression in the PRIMM method has fixed effects only, whereas our model incorporates random effects. As well as Christiansen and Morris (1997), the model studied aims at doing inference based on adequate analytical approximations of posterior distributions of the parameters. This avoids the use of computationally expensive methods such as Markov chain Monte Carlo (MCMC) methods. The approximations are based on the Laplace's method and asymptotic theory. Estimates of Poisson mixed effects regression parameters are obtained through the maximization of their joint posterior density via the Newton-Raphson algorithm. This study also provides the first two posterior moments of the Poisson parameters involved. The posterior distributon of these parameters is approximated by a gamma distribution. Applications to two datasets show that our model can be somehow considered as a generalization of the PRIMM method since it also allows clustered count data. Finally, the model is applied to data involving many types of adverse events recorded by the participants of a drug clinical trial which involved a quadrivalent vaccine containing measles, mumps, rubella and varicella. The Poisson regression incorporates the fixed effect corresponding to the covariate treatment/control as well as a random effect associated with the biological system of the body affected by the adverse events.

Sélection de modèle d'imputation à partir de modèles bayésiens hiérarchiques linéaires multivariés Theses and supervised dissertations / 2009-06
Chagra, Djamila
Abstract
Abstract The technique known as multiple imputation seems to be the most suitable technique for solving the problem of non-response. The literature mentions methods that models the nature and structure of missing values. One of the most popular methods is the PAN algorithm of Schafer and Yucel (2002). The imputations yielded by this method are based on a multivariate linear mixed-effects model for the response variable. A Bayesian hierarchical clustered and more flexible extension of PAN is given by the BHLC model of Murua et al. (2005). The main goal of this work is to study the problem of model selection for multiple imputation in terms of efficiency and accuracy of missing-value predictions. We propose a measure of performance linked to the prediction of missing values. The measure is a mean squared error, and hence in addition to the variance associated to the multiple imputations, it includes a measure of bias in the prediction. We show that this measure is more objective than the most common variance measure of Rubin. Our measure is computed by incrementing by a small proportion the number of missing values in the data and supposing that those values are also missing. The performance of the imputation model is then assessed through the prediction error associated to these pseudo missing values. In order to study the problem objectively, we have devised several simulations. Data were generated according to different explicit models that assumed particular error structures. Several missing-value prior distributions as well as error-term distributions are then hypothesized. Our study investigates if the true error structure of the data has an effect on the performance of the different hypothesized choices for the imputation model. We concluded that the answer is yes. Moreover, the choice of missing-value prior distribution seems to be the most important factor for accuracy of predictions. In general, the most effective choices for good imputations are a t-Student distribution with different cluster variances for the error-term, and a missing-value Normal prior with data-driven mean and variance, or a missing-value regularizing Normal prior with large variance (a ridge-regression-like prior). Finally, we have applied our ideas to a real problem dealing with health outcome observations associated to a large number of countries around the world. Keywords: Missing values, multiple imputation, Bayesian hierarchical linear model, mixed effects model.

Research projects Expand all Collapse all

Centre de recherches mathématiques (CRM) FRQNT/Fonds de recherche du Québec - Nature et technologies (FQRNT) / 2022 - 2029

Brain connectivity-based optimization of non-invasive brain stimulation to improve cognitive symptoms in schizophrenia IRSC/Instituts de recherche en santé du Canada / 2019 - 2026

Bayesian deeplearning prediction with sparse graphs CRSNG/Conseil de recherches en sciences naturelles et génie du Canada (CRSNG) / 2019 - 2025

Investment portfolio design and optimal execution of automated trading strategies: An exploratory research program MITACS Inc. / 2019 - 2019

Gibbs-repulsion and determinantal processes for statistical learning SPIIE/Secrétariat des programmes interorganismes à l’intention des établissements / 2018 - 2020

Uplift Models Extension for Smart Marketing. MITACS Inc. / 2017 - 2018

Variable Selection for Uplift Modeling. MITACS Inc. / 2017 - 2017

CENTRE DE RECHERCHES MATHEMATIQUES (CRM) FRQNT/Fonds de recherche du Québec - Nature et technologies (FQRNT) / 2015 - 2023

MODELE DE MELANGE AVEC NOYAUX POUR LA CLASSIFICATION DES DONNEES DE GRANDE DIMENSION Innovation, Sciences et Développement économique Canada / 2014 - 2015

KERNEL-BASED NON-PARAMETRIC BAYESIAN CLUSTERING MODELS CRSNG/Conseil de recherches en sciences naturelles et génie du Canada (CRSNG) / 2013 - 2019

INNOVATIVE CHEMOGENOMIC TOOLS TO IMPROVE OUTCOME IN ACUTE MYELOID LEUKEMIA Génome Canada / 2013 - 2017

INNOVATIVE CHEMOGENOMIC TOOLS TO IMPROVE OUTCOME IN IN ACUTE MYELOID LEUKEMIA Génome Québec / 2013 - 2017

COMPUTATIONAL RESOURCES FOR RESEARCH IN MATHEMATICS AND STATISTICS CRSNG/Conseil de recherches en sciences naturelles et génie du Canada (CRSNG) / 2013 - 2015

Selected publications Expand all Collapse all

The penalized biclustering model and related algorithms

Chekouo, Thierry et Murua, Alejandro, The penalized biclustering model and related algorithms 42, 1255-1277 (2015), , Journal of Applied Statistics

The conditional-Potts clustering model

Murua, Alejandro et Wicker, Nicolas, The conditional-Potts clustering model 23, 717--739 (2014), , J. Comput. Graph. Statist.

The Gibbs-plaid biclustering model

Chekouo, Thierry, Murua, Alejandro et Raffelsberger, Wolfgang , The Gibbs-plaid biclustering model , (2014), , The Annals of Applied Statistics

Kernel-based mixture models for classification

Murua, Alejandro et Wicker, Nicolas, Kernel-based mixture models for classification , (2014), , Computational Statistics

The conditional-Potts clustering model

Murua, Alejandro et Wicker, Nicolas, The conditional-Potts clustering model Rapport de recherché CRM 3317, (2011), , Université de Montréal

On Potts model clustering, kernel $K$-means, and density estimation

Murua, Alejandro, Stanberry, Larissa et Stuetzle, Werner, On Potts model clustering, kernel $K$-means, and density estimation 17, 629--658 (2008), , J. Comput. Graph. Statist.

Model based document classification and clustering

Murua, A., Stuetzle, W., Tantrum, J. et Sieberts, S., Model based document classification and clustering 8, 1--24 (2008), , Int. J. Tomogr. Stat.

Functional connectivity mapping using the ferromagnetic Potts spin model

Stanberry, Larissa, Murua, Alejandro et Cordes, Dietmar, Functional connectivity mapping using the ferromagnetic Potts spin model 29, 422-440 (2008), , Human Brain mapping

Country Clustering to Evaluate Global Health Outcomes

Hegyvary, Sue Thomas, Berry, Devon M et Murua, Alejandro, Country Clustering to Evaluate Global Health Outcomes 29, 319-339 (2008), , Journal of Public Health Policy

Mapping Functional Connectivity Using Potts Spin Model. Proceedings of the 14th Scientific Meeting of the International Society for Magnetic Resonance in Medicine

Stanberry, Larissa, Murua, Alejandro et Cordes, Dietmar, Mapping Functional Connectivity Using Potts Spin Model. Proceedings of the 14th Scientific Meeting of the International Society for Magnetic Resonance in Medicine , 1101 (2006), , ISMRM 2006, Seattle - USA

Probabilistic segmentation and intensity estimation for microarray images

Gotttardo, Raphael, Besag, Julian, Stephens, Matthew et Murua, Alejandro, Probabilistic segmentation and intensity estimation for microarray images 7, 85-99 (2006), , Biostatistics

Resting State Connectivity of Anterior and Posterior Cingulate Corteces Using Potts Spin Model. Proceedings of the 14th Scientific Meeting of the International Society for Magnetic Resonance in Medicine

Stanberry, Larissa, Murua, Alejandro et Cordes, Dietmar, Resting State Connectivity of Anterior and Posterior Cingulate Corteces Using Potts Spin Model. Proceedings of the 14th Scientific Meeting of the International Society for Magnetic Resonance in Medicine , 1090 (2006), , ISMRM 2006

On Potts model clustering, kernel K-means and density estimation.

Murua, A., Stanberry, L. et Stuetzle, W., On Potts model clustering, kernel K-means and density estimation. Rapport de recherche CRM 3225, (2006), , Université de Montréal

Clustering fMRI time series in the wavelet domain. Proceedings of the 13th Scientific Meeting of the International Society for Magnetic Resonance in Medicine

Stanberry, Larissa; Murua, Alejandro; Nandy, Rajesh et Cordes, Dietmar, Clustering fMRI time series in the wavelet domain. Proceedings of the 13th Scientific Meeting of the International Society for Magnetic Resonance in Medicine , 1604 (2005), , ISMRM 2005, Miami - USA

Optimal transformations for prediction in continuous-time stochastic processes: finite past and future

Gidas, Basilis et Murua, Alejandro, Optimal transformations for prediction in continuous-time stochastic processes: finite past and future 131, 479--492 (2005), , Probab. Theory Related Fields

Estimation and consistency for linear functionals of continuous-time processes from finite data set, II: Optimal Transformations for Prediction.

Gidas, Basilis et Murua, Alejandro, Estimation and consistency for linear functionals of continuous-time processes from finite data set, II: Optimal Transformations for Prediction. , (2004), , Department of Statistics, University of Washington,

Hierarchical model-based clustering of large datasets through fractionation and refractionation

Tantrum, Jeremy; Murua, Alejandro et Stuetzle, Werner, Hierarchical model-based clustering of large datasets through fractionation and refractionation 29, 315-326 (2004), , Information Systems

Assessment and Pruning of Hierarchical Model Based Clustering. The Ninth International Conference on Knowledge Discovery and Data Mining

Tantrum, Jeremy; Murua, Alejandro et Stuetzle, Werner, Assessment and Pruning of Hierarchical Model Based Clustering. The Ninth International Conference on Knowledge Discovery and Data Mining , (2003), , KDD 2003, Washington DC - USA

Upper bounds for error rates associated to linear combination of classifiers

Murua, Alejandro, Upper bounds for error rates associated to linear combination of classifiers 24, 591-602 (2002), , IEEE Transactions on Pattern Analysis and Machine Intelligence

Hierarchical model-based clustering of large datasets through fractionation and refractionation. The Eighth International Conference on Knowledge Discovery and Data Mining

Tantrum, Jeremy; Murua, Alejandro et Stuetzle, Werner, Hierarchical model-based clustering of large datasets through fractionation and refractionation. The Eighth International Conference on Knowledge Discovery and Data Mining , 183-190 (2002), , KDD 2002, Edmonton - Canada

Evaluation of sequential importance sampling for blind deconvolution via a simulation study. Proceedings of the XI European Signal Processing Conference

Ali, Ayesha; Richardson, Thomas; Murua, Alejandro et Roy, Sumit , Evaluation of sequential importance sampling for blind deconvolution via a simulation study. Proceedings of the XI European Signal Processing Conference , 315-318 (2002), , EUSIPCO 2002, Toulouse - France

Model-Based Clustering and Data Transformations for Gene Expression Data

Yeung, Ka Yee; Fraley, Chris; Murua, Alejandro; Raftery, Adrian et Ruzzo, Larry, Model-Based Clustering and Data Transformations for Gene Expression Data 17, 977-987 (2001), , Bioinformatics

Speech recognition using randomized relational decision tree

Amit, Yali et Murua, Alejandro, Speech recognition using randomized relational decision tree 9, 333-341 (2001), , IEEE Transactions on Speech and Audio Processing

A 2D extended HMM for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing

Li, J. et Murua, Alejandro, A 2D extended HMM for speech recognition. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , 349-352 (1999), , ICASSP 1999, Phoenix - USA

On the regularity of spectral densities of continuous-time completely linearly regular processes

Murua, Alejandro, On the regularity of spectral densities of continuous-time completely linearly regular processes 79, 213--227 (1999), , Stochastic Process. Appl.

Optimal transformations for prediction in continuous time stochastic processes.

Gidas, Basilis et Murua, Alejandro, Optimal transformations for prediction in continuous time stochastic processes. , 167-183 (1998), , I. Karatzas, B. Rajput and M. Taqqu editors

Estimation and consistency for linear functionals of continuous-time processes from a finite data set, I: Linear Predictors.

Gidas, Basilis et Murua, Alejandro, Estimation and consistency for linear functionals of continuous-time processes from a finite data set, I: Linear Predictors. , (1998), , Department of Statistics, University of Chicago

Stop consonants discrimination and clustering using nonlinear transformations and wavelets.

Gidas, Basilis et Murua, Alejandro, Stop consonants discrimination and clustering using nonlinear transformations and wavelets. Springer-Verlag, 13-62 (1996), , Steve E. Levinson and Larry Shepp, editors

Classification and clustering of stop consonants via nonparametric transformations and wavelets. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing

Gidas, Basilis et Murua, Alejandro, Classification and clustering of stop consonants via nonparametric transformations and wavelets. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , 872-875 (1995), , ICASSP 1995, Detroit - USA

Existence and multiplicity of solutions with prescribed period for a second order quasilinear ODE

del Pino, Manuel A., Manàsevich, Raùl F. et Murùa, Alejandro E., Existence and multiplicity of solutions with prescribed period for a second order quasilinear ODE 18, 79--92 (1992), , Nonlinear Anal.

On the number of $2\pi$ periodic solutions for $u''+g(u)=s(1+h(t))$ using the Poincaré-Birkhoff theorem

del Pino, Manuel A., Manàsevich, Raùl F. et Murùa, Alejandro, On the number of $2\pi$ periodic solutions for $u''+g(u)=s(1+h(t))$ using the Poincaré-Birkhoff theorem 95, 240--258 (1992), , J. Differential Equations