Study notes on “Stochastic Polyak Step-size, a sim-
ple step-size tuner with optimal rates” by F. Pe-
dregosa
Shuvomoy Das Gupta
March 29, 2024
Here are my study notes for Fabian Pedregosa’s amazing blog on
Stochastic Polyak Step-size; the full citation of Pedregosa’s blog is:
Stochastic Polyak Step-size, a simple step-size tuner with optimal rates, Fabian
Pedregosa, 2023 available at https://fa.bianp.net/blog/2023/sps/.
Contents
Problem setup 1
Notation 1
Stochastic Gradient Descent with Polyak Stepsize 1
Assumptions 2
Convergence analysis 2
Problem setup
We are interested in solving the problem
p
⋆
=
minimize
x∈R
d
n
f (x) =
1
n
∑
n
i=1
f
[i]
(x)
o
. . . (P)
where the optimal solution is achieved at x
⋆
. We have the following
assumptions regarding the nature of the problem.
Notation
Inner product between vectors x, y is denoted by
⟨
x | y
⟩
and Eu-
clidean norm of x is denoted by ∥x∥ =
p
⟨
x | y
⟩
. We let [1 : n] =
{1, 2, . . . , n} and z
+
= max{z, 0 }. Also, for notational convenience
we denote: sqd(x) = ∥x∥
2
and ReLU(z) = max{z, 0}. Comments are
enclosed in /
*
this is a comment
*
/.
Stochastic Gradient Descent with Polyak Stepsize
The algorithm called Stochastic Gradient Descent with Stochastic
Polyak Stepsize (SGD-SPS) to solve (P) is described by Algorithm
1. The uniform distribution with support {1, . . . , n} is denoted by
unif[1 : n]. One subgradient of the function f
[i]
evaluated at x is
denoted by
e
∇f (x).