Multi-view Unsupervised and Semi-Supervised Clustering based on Content and Connection Information

Haesun Park
Georgia Institute of Technology

Constrained Low Rank Approximation (CLRA) is a powerful framework for a variety of important tasks in large scale data analytics such as topic discovery in text data and community detection in social network data. In this talk, a hybrid method called Joint Nonnegative Matrix Factorization (JointNMF) is introduced for latent information discovery from multi-view data sets that contain both text content and connection structure information. The method jointly optimizes an integrated objective function, which is a combination of the Nonnegative Matrix Factorization (NMF) objective function for handling text content/attribute information and the Symmetric NMF (SymNMF) objective function for handling relation/connection information. An effective algorithm for the joint NMF objective function is proposed utilizing the block coordinate descent (BCD) method.
The proposed hybrid method simultaneously discovers content associations and related latent connections without any need for post-processing or additional clustering. In addition, known partial label information can be incorporated into a JointNMF for semi-supervised clustering framework. The experimental results from several real-life application problems illustrate the advantages of the proposed approaches.