Invertible Neural Networks - Understanding and Controlling Learned Representations

Jorn Jacobsen
Vector Institute and UofT

One way to understand deep networks is to analyze the information they discard about the input from layer to layer. However, estimating mutual information between input and hidden representations is intractable in high dimensional problems. Invertible deep networks circumvent this problem by guaranteeing information preservation. In this talk, I will discuss surprising similarities between non-invertible and invertible deep networks. Further, I will discuss how invertible models give rise to an alternative viewpoint on adversarial examples. Under this viewpoint adversarial examples are a consequence of excessive invariances learned by the classifier, manifesting themselves in striking failures when evaluating the model on out of distribution inputs. I will discuss how the commonly used cross-entropy objective encourages such overly invariant representations. Finally, I will present an extension to cross-entropy that, by exploiting properties of invertible deep networks, enables control of erroneous invariances in theory and practice.