Inspired by the practical success of deep learning, the broader math community has been energized recently to find theoretical justification for these methods. There is a large amount of theory from the computer science community, dating to the 1980s and earlier, but usually the quantitative guarantees are too loose to be helpful in practice, and it is rare that theory can predict something useful (such as what iteration to perform early-stopping in order to prevent over-fitting).
Many of these theories are less well-known inside applied math, so we briefly review essential results before focusing on the notion of algorithmic stability, popularized in the early 2000s, which is an alternative to the more mainstream VC dimension approach, and is one avenue that might give sharper theoretical guarantees. Algorithmic stability is appealing to applied mathematicians, and in particular analysts, since a lot of the technical work is similar to analysis used for convergence proofs.
We give an overview of the fundamental results of algorithmic stability, focusing on the stochastic gradient descent (SGD) method in the context of a nonconvex loss function, and give the latest state-of-the-art bounds, including some of our own work (arxiv joint with L. Madden and E. Dall'Anese) which is one of the first results that suggests when to do early-stopping.