October 20 Colloquium: "Making SGD Parameter-Free"

Talk Abstract

We develop an algorithm for parameter-free stochastic convex optimization (SCO) whose rate of convergence is only a double-logarithmic factor larger than the optimal rate for the corresponding known-parameter setting. In contrast, the best previously known rates for parameter-free SCO are based on online parameter-free regret bounds, which contain unavoidable excess logarithmic terms compared to their known-parameter counterparts. Our algorithm is conceptually simple, has high-probability guarantees, and is also partially adaptive to unknown gradient norms, smoothness, and strong convexity. At the heart of our results is a novel parameter-free certificate for the step size of stochastic gradient descent (SGD), and a time-uniform concentration result that assumes no a-priori bounds on SGD iterates.

Additionally, we present theoretical and numerical results for a dynamic step size schedule for SGD based on a variant of this idea. On a broad range of vision and language transfer learning tasks our methods performance is close to that of SGD with tuned learning rate. Also, a per-layer variant of our algorithm approaches the performance of tuned ADAM.

This talk is based on papers with Yair Carmon and Maor Ivgi.

Biography

Oliver Hinder is an Assistant Professor in the Industrial Engineering Department at the University of Pittsburgh. Before joining the University of Pittsburgh, he was a visiting postdoctoral researcher at Google in the Optimization and Algorithms group in New York. In 2019, he received my Ph.D. from the Department of Management Science and Engineering at Stanford University where Professor Yinyu Ye was his advisor.

Website: www.oliverhinder.com

Location

Sennott Square Building, Room 5317

Date

Friday, October 20 at 2:00 p.m. to 3:15 p.m.

News Type