Passer au contenu

/ Département de mathématiques et de statistique


Visualizing emergent patterns for exploratory data analysis



Guy Wolf

Yale University


High-throughput data collection is becoming increasingly common, and often introduces a need for exploratory analysis to reveal and understand hidden structure in the collected (high-dimensional) Big Data. One crucial aspect in enabling such analysis, especially in fields with few domain experts, is to produce reliable, robust, and human-interpretable visualizations that emphasize desired trends in the data. In this talk, I will approach this goal by combining together kernel methods and deep learning to capture clusters and dynamics in data. In particular, I will focus on latent progression patterns that often exist in modern data (e.g., due to natural development or guided by external stimuli), and interpretable characterization of transition pathways within them, which is crucial in exploratory settings. For example, in genomic and proteomic data analysis, cells are actively differentiating or progressing in response to signals, and characterizing these progressions can unlock deep understanding of normal development, as well as enable detection of abnormal transitions (e.g., cancerous metastasis).

To provide such analysis, I will present PHATE (Potential of Heat-diffusion for Affinity-based Transition Embedding) - a novel unsupervised low-dimensional embedding for visualization of data, which reveals and emphasizes transitions and emergent progression patterns. This method uses heat diffusion processes to construct an intrinsic data geometry and compute distances using their free energy potential. The constructed diffusion-potential geometry captures high-dimensional transition structures (when they exist) while enabling their visualization via a low-dimensional embedding that approximates local and global nonlinear relations in the data. The effectiveness of the produced visualization for exploratory data analysis will be demonstrated on both synthetic and real data, including facial expressions and new scRNA-seq data of embryoid body development that was collected specifically to support development and validation of this method.

Finally, I will discuss future directions for advancing deep learning tools in exploratory settings based on the principles enabled by these developments.

Date :        Mardi le 24 avril 2018

Heure :      10h30 à 11h30

Lieu :         Pavillon André-Aisenstadt

Salle :        5340