Reproducing Optimized Gradient Method (OGM) in Julia+JuMP
Shuvomoy Das Gupta
May 16, 2022
In this blog, we discuss how to reproduce the Optimized Gradient Method (OGM) using Julia+JuMP step by step. OGM is the optimal method for minimizing smooth convex functions with respect to the performance measure of function value gap. It was first discovered numerically by Drori and Teboulle in their paper https://arxiv.org/pdf/1206.3209.pdf and with its analytical form appearing in the paper https://arxiv.org/pdf/1406.5468.pdf by Kim and Fessler. For more details, please see the aforementioned papers.
A jupyter notebook for the code in this blog can be viewed and downloaded from this link.
Table of contents
Background
Optimization problem. The optimization problem is
where is the class of -smooth convex functions.
Fixed-step first-order algorithm. The optimization algorithm in consideration is:
where we use the notation that and and are related via the following relationship:
Notation. Denote,
and
where is the unit vector with th component equal to and the rest being zero. Next, define
where
Then, the performance estimation optimization algorithm is:
where the decision variables are and Let us see, how we can solve this problem in Julia.
Reproducing OGM step-by-step in Julia+JuMP
# Load the packages
using JuMP, MosekTools, Mosek, LinearAlgebra, OffsetArrays # %% Some helper functions
# %% construct e_i in R^n
function e_i(n, i)
e_i_vec = zeros(n, 1)
e_i_vec[i] = 1
return e_i_vec
end
# %% construct symmetric outer product
function ⊙(a,b)
return ((a*b')+(b*a')) ./ 2
end # %% Parameters to be tuned
N = 5
L = 1
μ = 0
c_w = 1
c_f = 0 Next, define the bold vectors:
where is the unit vector with th component equal to and the rest being zero.
# define all the bold vectors
# --------------------------
# define 𝐰_0
𝐰_0 = e_i(N+2, 1)
𝐰_star = zeros(N+2, 1)
𝐠_star = zeros(N+2,1)
# define 𝐠_0, 𝐠_1, …, 𝐠_N
# first we define 𝐠_Julia vectors and then 𝐠 vectors
𝐠 = OffsetArray(zeros(N+2, N+1), 1:N+2, 0:N)
# 𝐠= [𝐠_0 𝐠_1 𝐠_2 ... 𝐠_N]
for i in 0:N
𝐠[:,i] = e_i(N+2, i+2)
end
𝐟_star = zeros(N+1,1)
# time to define 𝐟_0, 𝐟_1, …, 𝐟_N
𝐟 = OffsetArray(zeros(N+1, N+1), 1:N+1, 0:N)
# 𝐟 = [𝐟_0, 𝐟_1, …, 𝐟_N]
for i in 0:N
𝐟[:,i] = e_i(N+1, i+1)
end Next, we define our decision variables using JuMP.
pep_model = Model(optimizer_with_attributes(Mosek.Optimizer, "MSK_DPAR_INTPNT_CO_TOL_PFEAS" => 1e-10))
# time to define all the variables
# -------------------------------
# define α′ (can be typed as \alpha[TAB]\prime[TAB])
@variable(pep_model, α′[1:N, 0:N-1])
# define λ variables
@variable(pep_model, λ_i_ip1[0:N-1] >= 0)
# this defines (λ_{i,i+1})_{i∈[0:N-1]} in Julia indexing
@variable(pep_model, λ_star_i[0:N] >= 0)
# this defines (λ_{⋆,i})_{i∈[0:N]} in Julia indexing
# define τ
@variable(pep_model, τ >= 0) Define the objective next, which is to minimize .
# define objective
@objective(
pep_model,
Min,
τ
) Time to add the linear constraint
first, which are much simpler.
# Add the linear equality constraint
# ----------------------------------
@constraint(pep_model,
τ * c_f .* 𝐟[:,0] + sum(λ_i_ip1[i] .* (𝐟[:,i+1]-𝐟[:,i]) for i in 0:N-1)
+ sum(λ_star_i[i] .* (𝐟[:,i] - 𝐟_star) for i in 0:N)
.== 𝐟[:,N] - 𝐟_star
) Now let us construct the giant sdp constraint step by step. It has 6 summands, which we add one by one.
# Add the giant LMI constraint step by step
# ---------------------------------------- Term 1 is
term_1 = c_w * τ * ⊙(𝐰_0,𝐰_0)
Term 2 is
term_2_part_1 = sum(λ_i_ip1[i] .* ⊙(𝐠[:,i]-𝐠[:,i+1],𝐠[:,i]-𝐠[:,i+1]) for i in 0:N-1)
term_2_part_2 = sum(λ_star_i[i] .* ⊙(𝐠_star - 𝐠[:,i],𝐠_star - 𝐠[:,i]) for i in 0:N)
term_2 = (term_2_part_1 + term_2_part_2) ./ (2*L)
Term 3 is .
term_3 = - λ_star_i[0] .* ⊙(𝐠[:,0],𝐰_0)
Term 4 is
term_4_part_1 = - sum( λ_i_ip1[i] .* ⊙(𝐠[:,i],𝐰_0) for i in 1:N-1)
term_4_part_2 = (1/L) .*
sum( sum(α′[i,j] .* ⊙(𝐠[:,i],𝐠[:,j]) for j in 0:i-1) for i in 1:N-1)
term_4 = term_4_part_1 + term_4_part_2
Term 5 is
term_5 = - ( ⊙(𝐠[:,N],𝐰_0) - (1/L) .* sum(α′[N,j] .* ⊙(𝐠[:,N],𝐠[:,j]) for j in 0:N-1))
Okay, we are almost there. One more term. Term 6 is:
term_6_part_1 = sum(λ_i_ip1[i] .* ⊙(𝐠[:,i+1],𝐰_0) for i in 0:N-1)
term_6_part_2 = - (1/L) .*
sum( sum( α′[i,j] .* ⊙(𝐠[:,i+1],𝐠[:,j]) for j in 0:i-1) for i in 1:N-1)
term_6 = term_6_part_1 + term_6_part_2
Time to add all these terms to construct :
# oof, okay constructed the terms for the LMI constraint, let us hope that there is no mistake and add them together
S_mat = term_1 + term_2 + term_3 + term_4 + term_5 + term_6
Let's add the LMI cosntraint .
# add the LMI constraint now
@constraint(pep_model,
S_mat in PSDCone()
)
Time to optimize the code now!
# time to optimize!
# -----------------
optimize!(pep_model)
println("termination status =", termination_status(pep_model) )
The output is as follows.
# Problem
# Name :
# Objective sense : min
# Type : CONIC (conic optimization problem)
# Constraints : 34
# Cones : 0
# Scalar variables : 37
# Matrix variables : 1
# Integer variables : 0
#
# Optimizer started.
# Presolve started.
# Linear dependency checker started.
# Linear dependency checker terminated.
# Eliminator started.
# Freed constraints in eliminator : 5
# Eliminator terminated.
# Eliminator started.
# Freed constraints in eliminator : 0
# Eliminator terminated.
# Eliminator - tries : 2 time : 0.00
# Lin. dep. - tries : 1 time : 0.05
# Lin. dep. - number : 0
# Presolve terminated. Time: 0.08
# Problem
# Name :
# Objective sense : min
# Type : CONIC (conic optimization problem)
# Constraints : 34
# Cones : 0
# Scalar variables : 37
# Matrix variables : 1
# Integer variables : 0
#
# Optimizer - threads : 4
# Optimizer - solved problem : the primal
# Optimizer - Constraints : 29
# Optimizer - Cones : 1
# Optimizer - Scalar variables : 23 conic : 16
# Optimizer - Semi-definite variables: 1 scalarized : 28
# Factor - setup time : 0.02 dense det. time : 0.00
# Factor - ML order time : 0.00 GP order time : 0.00
# Factor - nonzeros before factor : 423 after factor : 435
# Factor - dense dim. : 0 flops : 1.21e+04
# ITE PFEAS DFEAS GFEAS PRSTATUS POBJ DOBJ MU TIME
# 0 1.0e+00 1.0e+00 1.0e+00 0.00e+00 0.000000000e+00 0.000000000e+00 1.0e+00 0.20
# 1 2.7e-01 2.7e-01 1.2e-01 2.14e+00 -2.378627944e-02 -6.117274954e-03 2.7e-01 0.28
# 2 1.2e-01 1.2e-01 2.9e-02 1.32e+00 7.175923608e-02 7.139302023e-02 1.2e-01 0.30
# 3 4.2e-02 4.2e-02 6.9e-03 1.05e+00 4.477758503e-02 4.729110878e-02 4.2e-02 0.30
# 4 2.3e-02 2.3e-02 2.9e-03 7.37e-01 2.880293993e-02 3.057913998e-02 2.3e-02 0.31
# 5 6.7e-03 6.7e-03 4.0e-04 9.85e-01 3.356646000e-02 3.302278094e-02 6.7e-03 0.33
# 6 2.5e-03 2.5e-03 1.1e-04 6.92e-01 2.291866424e-02 2.288414204e-02 2.5e-03 0.33
# 7 6.6e-04 6.6e-04 1.5e-05 8.71e-01 2.066303721e-02 2.066465445e-02 6.6e-04 0.33
# 8 9.8e-05 9.8e-05 8.8e-07 8.83e-01 1.883068878e-02 1.882786667e-02 9.8e-05 0.33
# 9 1.7e-06 1.7e-06 1.9e-09 9.93e-01 1.859217137e-02 1.859212284e-02 1.7e-06 0.33
# 10 4.0e-08 4.0e-08 7.3e-12 1.00e+00 1.858820828e-02 1.858820710e-02 4.0e-08 0.34
# 11 2.1e-09 2.1e-09 8.5e-14 1.00e+00 1.858813924e-02 1.858813917e-02 2.1e-09 0.34
# 12 7.9e-11 1.8e-10 6.3e-16 1.00e+00 1.858813673e-02 1.858813673e-02 7.9e-11 0.34
# Optimizer terminated. Time: 0.45
#
# termination status =OPTIMAL
We now collect the optimal values of our decision variables.
# Collect the decision variables
# -----------------------------
α′_opt = OffsetArray(value.(α′), 1:N, 0:N-1)
λ_opt_i_ip1 = OffsetVector(value.(λ_i_ip1), 0:N-1)
λ_opt_star_i = OffsetVector(value.(λ_star_i), 0:N)
τ_opt = value(τ)
The resultant is as follows.
# α′_opt =
# [
# 0.314962 0.0 0.0 0.0 0.0
# 0.641151 0.722442 0.0 0.0 0.0
# 1.05006 1.38407 1.2547 0.0 0.0
# 1.54002 2.17685 2.32945 1.90951 0.0
# 1.92565 2.8008 3.17533 2.96989 2.07777
# ]
Next, we recover from using the formula
# Recover α_{i,j} from α′_{i,j}
# -----------------------------
α_opt = OffsetArray(zeros(size(α′_opt)), 1:N, 0:N-1)
for i in 1:N-1
for j in 0:i-1
α_opt[i,j] = (α′_opt[i,j] / λ_opt_i_ip1[i])
end
end
for j in 0:N-1
α_opt[N,j] = α′_opt[N,j]
end
The output for α_opt is as follows.
# α_opt
# =
# [
# 1.61803 0.0 0.0 0.0 0.0
# 1.79217 2.01939 0.0 0.0 0.0
# 1.86775 2.46185 2.23175 0.0 0.0
# 1.90789 2.69683 2.88589 2.36563 0.0
# 1.92565 2.8008 3.17533 2.96989 2.07777
# ]
We now, recover from using the formula
# Recover h_{i,j} from α_{i,j}
h_opt = OffsetArray(zeros(size(α_opt)), 1:N, 0:N-1)
# set h(1,0)
h_opt[1,0] = α_opt[1,0]
for i in 2:N
for j in 0:i-1
if j == i-1
h_opt[i,j] = α_opt[i,j]
else
h_opt[i,j] = α_opt[i,j] - α_opt[i-1,j]
end
end
end
We have the following values for h_opt.
# h_opt
# =
# [
# 1.61803 0.0 0.0 0.0 0.0
# 0.174133 2.01939 0.0 0.0 0.0
# 0.0755813 0.442461 2.23175 0.0 0.0
# 0.0401385 0.234975 0.654138 2.36563 0.0
# 0.0177604 0.103971 0.289442 0.604262 2.07777
# ]
Putting everything in a compact function
Now, let us put everything in a function. We need to add the packages first.
function full_pep_solver(N, L, μ, c_w, c_f)
# define all the bold vectors
# --------------------------
# define 𝐰_0
𝐰_0 = e_i(N+2, 1)
𝐰_star = zeros(N+2, 1)
𝐠_star = zeros(N+2,1)
# define 𝐠_0, 𝐠_1, …, 𝐠_N
# first we define 𝐠_Julia vectors and then 𝐠 vectors
𝐠 = OffsetArray(zeros(N+2, N+1), 1:N+2, 0:N)
# 𝐠= [𝐠_0 𝐠_1 𝐠_2 ... 𝐠_N]
for i in 0:N
𝐠[:,i] = e_i(N+2, i+2)
end
𝐟_star = zeros(N+1,1)
# time to define 𝐟_0, 𝐟_1, …, 𝐟_N
𝐟 = OffsetArray(zeros(N+1, N+1), 1:N+1, 0:N)
# 𝐟 = [𝐟_0, 𝐟_1, …, 𝐟_N]
for i in 0:N
𝐟[:,i] = e_i(N+1, i+1)
end
# Define JuMP model
# ----------------
pep_model = Model(optimizer_with_attributes(Mosek.Optimizer, "MSK_DPAR_INTPNT_CO_TOL_PFEAS" => 1e-10))
#define all the variables
# -------------------------------
# define α′ (can be typed as \alpha[TAB]\prime[TAB])
@variable(pep_model, α′[1:N, 0:N-1])
# define λ variables
@variable(pep_model, λ_i_ip1[0:N-1] >= 0)
# this defines (λ_{i,i+1})_{i∈[0:N-1]} in Julia indexing
@variable(pep_model, λ_star_i[0:N] >= 0)
# this defines (λ_{⋆,i})_{i∈[0:N]} in Julia indexing
# define τ
@variable(pep_model, τ >= 0)
# define objective
# ------------------
@objective(
pep_model,
Min,
τ
)
# Add the linear equality constraint
# ----------------------------------
@constraint(pep_model,
τ * c_f .* 𝐟[:,0] + sum(λ_i_ip1[i] .* (𝐟[:,i+1]-𝐟[:,i]) for i in 0:N-1)
+ sum(λ_star_i[i] .* (𝐟[:,i] - 𝐟_star) for i in 0:N)
.== 𝐟[:,N] - 𝐟_star
)
# Add the giant LMI constraint step by step
# ----------------------------------------
# Define all the terms one by one
term_1 = c_w * τ * ⊙(𝐰_0,𝐰_0)
term_2_part_1 = sum(λ_i_ip1[i] .* ⊙(𝐠[:,i]-𝐠[:,i+1],𝐠[:,i]-𝐠[:,i+1]) for i in 0:N-1)
term_2_part_2 = sum(λ_star_i[i] .* ⊙(𝐠_star - 𝐠[:,i],𝐠_star - 𝐠[:,i]) for i in 0:N)
term_2 = (term_2_part_1 + term_2_part_2) ./ (2*L)
term_3 = - λ_star_i[0] .* ⊙(𝐠[:,0],𝐰_0)
term_4_part_1 = - sum( λ_i_ip1[i] .* ⊙(𝐠[:,i],𝐰_0) for i in 1:N-1)
term_4_part_2 = (1/L) .*
sum( sum(α′[i,j] .* ⊙(𝐠[:,i],𝐠[:,j]) for j in 0:i-1) for i in 1:N-1)
term_4 = term_4_part_1 + term_4_part_2
term_5 = - ( ⊙(𝐠[:,N],𝐰_0) - (1/L) .* sum(α′[N,j] .* ⊙(𝐠[:,N],𝐠[:,j]) for j in 0:N-1))
term_6_part_1 = sum(λ_i_ip1[i] .* ⊙(𝐠[:,i+1],𝐰_0) for i in 0:N-1)
term_6_part_2 = - (1/L) .*
sum( sum( α′[i,j] .* ⊙(𝐠[:,i+1],𝐠[:,j]) for j in 0:i-1) for i in 1:N-1)
term_6 = term_6_part_1 + term_6_part_2
# oof, okay constructed the terms for the LMI constraint, let us hope that there is no mistake and add them together
S_mat = term_1 + term_2 + term_3 + term_4 + term_5 + term_6
# add the LMI constraint now
@constraint(pep_model,
S_mat in PSDCone()
)
# time to optimize!
# -----------------
optimize!(pep_model)
println("termination status =", termination_status(pep_model) )
α′_opt = OffsetArray(value.(α′), 1:N, 0:N-1)
λ_opt_i_ip1 = OffsetVector(value.(λ_i_ip1), 0:N-1)
λ_opt_star_i = OffsetVector(value.(λ_star_i), 0:N)
τ_opt = value(τ)
# Recover α_{i,j} from α′_{i,j}
# -----------------------------
α_opt = OffsetArray(zeros(size(α′_opt)), 1:N, 0:N-1)
for i in 1:N-1
for j in 0:i-1
α_opt[i,j] = (α′_opt[i,j] / λ_opt_i_ip1[i])
end
end
for j in 0:N-1
α_opt[N,j] = α′_opt[N,j]
end
# Recover h_{i,j} from α_{i,j}
# ----------------------------
h_opt = OffsetArray(zeros(size(α_opt)), 1:N, 0:N-1)
# set h(1,0)
h_opt[1,0] = α_opt[1,0]
# set the rest of the variables
for i in 2:N
for j in 0:i-1
if j == i-1
h_opt[i,j] = α_opt[i,j]
else
h_opt[i,j] = α_opt[i,j] - α_opt[i-1,j]
end
end
end
# return all the outputs
# ----------------------
return α′_opt, λ_opt_i_ip1, λ_opt_star_i, τ_opt, α_opt, h_opt
end
Time to run and test.
α′_opt, λ_opt_i_ip1, λ_opt_star_i, τ_opt, α_opt, h_opt = full_pep_solver(N, L, μ, c_w, c_f)