On the Convergence and Regularity of Adversarially Robust Classifiers

Rachel Morris
Concordia

While employing machine learning algorithms to classify data achieves high levels of accuracy in most contexts, research has found that these classifiers can be fooled by cleverly targeted attacks, producing adversarial examples. One strategy to combat adversarial interference is to include an adversary in the training process so that the resulting classifier is robust to small perturbations. In the binary classification setting, adversarial training can be formulated as a robust optimization problem with a regularizing nonlocal perimeter term. In this presentation, I will demonstrate how we can leverage techniques from calculus of variations and geometric measure theory to study properties of adversarially robust classifiers. In particular, I will show that adversarially robust classifiers will converge uniformly to Bayes classifiers as the adversarial strength decreases to 0. I will also discuss my recent work studying the regularity of adversarially robust classifiers by using explicit geometric perturbations and an energy analysis at singular points.