Samir Adly

Understanding acceleration phenomena in first-oder optimization algorithms

Samir Adly
Limoges

This presentation, intended for a broad spectrum of researchers, highlights the latest advancements in accelerating first-order optimization algorithms, a domain that currently captures the attention of numerous research teams worldwide. First-order methods, such as gradient descent or stochastic gradient, have gained significant popularity. The flagship development in this domain can be attributed to the mathematician Yurii Nesterov, who proposed in 1983 a class of accelerated gradient methods that demonstrated faster global convergence rates than gradient descent. Another notable mention is the FISTA algorithm, introduced by Beck and Teboulle in 2009, which has enjoyed widespread use within the machine learning and signal/image processing communities. From a different perspective, gradient-based optimization algorithms can be analyzed through the lens of Ordinary Differential Equations (ODEs). This perspective allows us to propose new algorithms by discretizing these ODEs and enhances their performance through acceleration techniques, all while maintaining the low computational complexity necessary for analyzing massive datasets. We will also delve into the Ravine method, introduced by Gelfand and Tsetlin in 1961. Interestingly, Nesterov's accelerated gradient method and the Ravine method share a close relationship; either method can be derived from the other by simply reversing the order of extrapolation and gradient operations in their definitions. Even more surprisingly, both methods are based on the same equations. Consequently, practitioners often use the Ravine method, occasionally confusing it with Nesterov's Accelerated Gradient. Throughout the presentation, we will also include some historical facts and pose open questions to encourage deeper exploration in the field.

Understanding acceleration phenomena in first-oder optimization algorithms

Samir AdlyLimoges

Samir Adly
Limoges