# Accelerated first-order methods as fixed-step first-order methods: conversion between momentum form and standard form

Shuvomoy Das Gupta

August 06, 2021

In this blog, we discuss how to write accelerated first-order methods as fixed-step first-order methods.

## Converting between momentum-form and and standard form of fixed-step first-order method

Consider a $$L$$-smooth function. ​Fixed step first order methods for this function $$f$$ are algorithms of the form (called the standard form)

$x_{k}=x_{0}-\sum_{j=0}^{k-1}\frac{h_{k,j}}{L}\nabla f(x_{j}),\quad\textrm{(FSFOM)}$

​​​ where $$k\in\{1,\ldots,N\}$$ and $$x_0$$ is the initial point​​​.

Also, we have the following "momentum form" of the fixed-step first-order method, where $$i\in \{0,\ldots, N-1\}$$.

\begin{align*} \begin{array}{ll} y_{i+1} & =x_{i}-\frac{1}{L}\nabla f(x_{i})\\ x_{i+1} & =y_{i+1}+\zeta_{i+1}(y_{i+1}-y_{i})+\eta_{i+1}(y_{i+1}-x_{i}), \end{array}\quad(\textrm{MomentumForm)} \end{align*}

which we show to be equivalent to (FSFOM). To show that (MomentumForm) is in the form (FSFOM), we put, the iterative form $$y_{+1}$$​​ and $$y_{i}$$​​ in terms of the $$x$$​​ iterates in the second iterate. For simplification purpose, denote $$g_{i}=\nabla f(x_{i})$$​​​. We get:

\begin{align*} x_{i+1}=x_{i}+\zeta_{i+1}\left(x_{i}-x_{i-1}\right)-\frac{\left(\zeta_{i+1}+\eta_{i+1}+1\right)}{L}g_{i}+\frac{\zeta_{i+1}}{L}g_{i-1},\quad \textrm{(MOM-SIMP)} \end{align*}

where the Mathematica code for this is shown below.

Subscript[y, i + 1] = Subscript[x, i] -  Subscript[g, i]/L;
Subscript[y, i] = Subscript[x, i - 1] - Subscript[g, i - 1]/L;
Subscript[x, i + 1] =
Subscript[Eta], i + 1] (Subscript[y, i + 1] - Subscript[x, i]) + Subscript[\[Zeta], i + 1] (Subscript[y, i + 1] - Subscript[y, i]) + Subscript[y, i + 1]; term1 = Collect[Subscript[x, i + 1], {Subscript[x, i], Subscript[x, i] - Subscript[x, i - 1], Subscript[g, i]}, Simplify] (*Output: Subscript[x, i] + ( Subscript[g, -1 + i] Subscript[\[Zeta], 1 + i])/L + (-Subscript[x, -1 + i] + Subscript[x, i]) Subscript[\[Zeta], 1 + i] - ( Subscript[g, i] (1 + Subscript[\[Zeta], 1 + i] + Subscript[\[Eta], 1 + i]))/L *) Now from (FSFOM): \[\begin{align*} x_{i} & =x_{0}-\sum_{j=0}^{i-1}\frac{h_{i,j}}{L}g_{j},\\ x_{i-1} & =x_{0}-\sum_{j=0}^{i-2}\frac{h_{i-1,j}}{L}g_{j},\end{align*}

​​ which gives

\begin{align*} x_{i}-x_{i-1} & =-\sum_{j=0}^{i-2}\frac{(-h_{i-1,j}+h_{i,j})}{L}g_{j}-\frac{h_{i,i-1}}{L}g_{i-1}\quad \textrm{(Diff-x)} \end{align*}

and putting this in (MOM-SIMP) and then simplifying we get:

\begin{align*} & x_{i+1}\\ = & x_{i}-\zeta_{i+1}\sum_{j=0}^{i-2}\frac{\left(h_{i,j}-h_{i-1,j}\right)}{L}g_{j}-\frac{\zeta_{i+1}\left(h_{i,i-1}-1\right)}{L}g_{i-1}-\frac{\left(\zeta_{i+1}+\eta_{i+1}+1\right)}{L}g_{i}, \end{align*} \quad \textrm{(FSFOM-A)}

where the Mathematica code for the simplification is shown below:

termDif = -\!$$\*UnderoverscriptBox[\($Sum]$$, $$j = 0$$, $$i - 2$$] \*FractionBox[$$\*SubscriptBox[\(g$$, $$j$$]\ $$( \*SubscriptBox[\(h$$, $$i, j$$] - \*SubscriptBox[$$h$$, $$i - 1, j$$])\)\), $$L$$]\) - ( Subscript[g, i - 1] Subscript[h, i, i - 1])/L; term2 = term1 /. ((-Subscript[x, -1 + i] + Subscript[x, i])) -> termDif; term3 = Collect[term2, {Subscript[g, i], Subscript[g, i - 1]}, Simplify] Recall that, using (Diff-x) any (FSFOM) satisfying sequence will obey: \[x_{i+1}=x_{i}-\sum_{j=0}^{i-1}\frac{(h_{i+1,j}-h_{i,j})}{L}g_{j}-\frac{h_{i+1,i}}{L}g_{i}\quad \textrm{(Diff-x-2)}$

Note that (Diff-x-2) and (FSFOM-A) are in the same format now for a pattern matching. Comparing the terms part by part, we get the following recursive system for $$i\in [0:N-1]$$ ​

\begin{align*}\forall_{j\in[0:i-2]}\quad h_{i+1,j} & -h_{i,j}=\zeta_{i+1}\left(h_{i,j}-h_{i-1,j}\right)\\ h_{i+1,i-1}-h_{i,i-1} & =\zeta_{i+1}\left(h_{i,i-1}-1\right)\\ h_{i+1,i} & =\zeta_{i+1}+\eta_{i+1}+1. \end{align*}

with initial condition $$h_{1,k}=0$$ if $$k<0$$ and $$h_{0,k}=0$$ for all $$k$$. This system of equation gives us a way to compute $$\zeta,\eta$$​ from $$h$$​.

### Julia code to construct $$h$$ from $$\{\zeta, \eta\}$$ and back

The main system of equations is for this conversion process is:

\begin{align*} & h\equiv\{h_{i,j}\}_{i\in[1:N],j\in[0:i-1]}\\ & \forall_{i\in[0:N-1]}\forall_{j\in[0:i-2]}\quad h_{i+1,j}-h_{i,j}=\zeta_{i+1}\left(h_{i,j}-h_{i-1,j}\right)\\ & \forall_{i\in[0:N-1]}\quad h_{i+1,i-1}-h_{i,i-1}=\zeta_{i+1}\left(h_{i,i-1}-1\right)\\ & \forall_{i\in[0:N-1]}\quad h_{i+1,i}=\zeta_{i+1}+\eta_{i+1}+1\\ & h_{1,j}=0,\textrm{ if }j<0\\ & \forall_{j\in[0:i-1]}\quad h_{0,j}=0. \end{align*}

#### Code to construct $$h$$ from $$\{\zeta, \eta\}$$

Suppose we have $$\{\zeta, \eta\}$$ and we want to construct $$h$$ from that. So, the Julia code to do that is as follows:

## function to construct h from ζ, η

function construct_h_from_ζ_η(N, L, ζ, η; solution_method = :constraint)

# other option for solution_method is :penalty

# time to construct model

mod1 = Model(Gurobi.Optimizer)

@variable(mod1, h[i=0:N, j=-2:N] >= 0)

# we are just interested in h[i = 1:N, j = 0:i-1], rest should be set to zero

for i in 0:N
for j in -2:N
if !(i >= 1 && i <= N && j >= 0 && j <= i-1)
fix(h[i,j], 0.0; force = true)
end
end
end

if solution_method == :constraint

for i in 0:N-1
for j in 0:i-2
@constraint(mod1, h[i+1,j] - h[i,j] .== ζ[i+1]*(h[i,j] - h[i-1,j]) )
end
end

for i in 0:N-1
@constraint(mod1, h[i+1,i-1] - h[i,i-1] .== ζ[i+1]*(h[i,i-1] -1) )
end

for i in 0:N-1
@constraint(mod1, h[i+1,i] .== ζ[i+1] + η[i+1] + 1 )
end

@objective(mod1, Min, 0)

elseif solution_method == :penalty

term_1 = @expression(mod1, sum( (h[i+1,j] - h[i,j] - ζ[i+1]*(h[i,j] - h[i-1,j]))^2 for i in 0:N-1, j in 0:N-1 if j <= i-2) )

term_2 = @expression(mod1, sum( (h[i+1,i-1] - h[i,i-1] - ζ[i+1]*(h[i,i-1] -1))^2 for i in 0:N-1))

term_3 = @expression(mod1, sum( (h[i+1,i] - (ζ[i+1] + η[i+1] + 1) )^2 for i in 0:N-1) )

@objective(mod1, Min, term_1 + term_2 + term_3)

end

optimize!(mod1)

if termination_status(mod1) != OPTIMAL
@error "termination status is not optimal"
end

obj_val = objective_value(mod1)

@info "Value of h for N=$(N) with fitting error =$(obj_val)"

h_star = value.(h)

# h_star_compact = h[i=1:N, j = 0:N-1 if j <= i-1]
#
# @show h_star_compact

return h_star

end

Let us test the function. First, we generate a specific $$\zeta, \eta$$ to test for, which comes from OGM (see below about more details about OGM).

## Load the packages

using OffsetArrays, Gurobi, JuMP, LinearAlgebra

N = 5

# Generate θ for OGM

L = 1

R = 1

θ = Dict{Int64,Float64}()

θ[0] = 1

for i in 1:N
if i <= N-1
θ[i] = (1+sqrt(1+4*θ[i-1]^2))/2
elseif i == N
θ[i] = (1+sqrt(1+8*θ[i-1]^2))/2
end
end

ζ_OGM = OffsetVector(zeros(N), 1:N)

for i in 0:N-1
ζ_OGM[i+1] = (θ[i]-1)/θ[i+1]
end

@show ζ_OGM

# output:
# ζ_OGM = [0.0, 0.28175352512532087, 0.434042782780302, 0.5310638054044795, 0.4424791858537259]

η_OGM = OffsetVector(zeros(N), 1:N)

for i in 0:N-1
η_OGM[i+1] = θ[i]/θ[i+1]
end

@show η_OGM

# output:
# η_OGM = [0.6180339887498948, 0.7376403052281875, 0.7977067398993897, 0.8345650247944008, 0.6352906827290474]

Now let us test the function construct_h_from_ζ_η.

h_star_constraint = construct_h_from_ζ_η(N, L, ζ_OGM, η_OGM; solution_method = :constraint)

@show round.(h_star_constraint[1:N, 0:N-1], digits = 5)

# The output is:
# h_star_constraint[1:N, 0:N-1] =
# [
# 1.61803  0.0      0.0      0.0      0.0;
# 1.79217  2.01939  0.0      0.0      0.0;
# 1.86775  2.46185  2.23175  0.0      0.0;
# 1.90789  2.69683  2.88589  2.36563  0.0;
# 1.92565  2.8008   3.17533  2.96989  2.07777
# ]

h_star_penalty = construct_h_from_ζ_η(N, L, ζ_OGM, η_OGM; solution_method = :penalty)

@show round.(h_star_penalty[1:N, 0:N-1], digits = 5)

# The output is:
# h_star_penalty[1:N, 0:N-1] =
# [
# 1.61803  0.0      0.0      0.0      0.0
# 1.79217  2.01939  0.0      0.0      0.0
# 1.86775  2.46185  2.23175  0.0      0.0
# 1.90789  2.69683  2.88589  2.36563  0.0
# 1.92565  2.8008   3.17533  2.96989  2.07777
# ]

h_star = h_star_penalty # we will use this for another test below

#### Code to construct $$\{\zeta, \eta\}$$ from $$h$$

Now we consider the opposite direction. Now we want to construct $$\{\zeta, \eta\}$$ from a given $$h$$. The Julia code to do that is as follows:

function construct_ζ_η_from_h(N, L, h; fix_ζ_η = :true, ζ_ws = ζ_OGM, η_ws = η_OGM)

mod1 = Model(Gurobi.Optimizer)

@variable(mod1, ζ[i = 1:N])

@variable(mod1, η[i = 1:N])

if fix_ζ_η == :true
@info "fixing  ζ and η to that of OGM to check if there are multiple solutions"
fix.(ζ, ζ_ws; force=true)
fix.(η, η_ws; force = true)
end

term_1 = @expression(mod1, sum( (h[i+1,j] - h[i,j] - ζ[i+1]*(h[i,j] - h[i-1,j]))^2 for i in 0:N-1, j in 0:N-1 if j <= i-2) )

term_2 = @expression(mod1, sum( ( h[i+1,i-1] - h[i,i-1] - ζ[i+1]*(h[i,i-1] -1) )^2 for i in 0:N-1))

term_3 = @expression(mod1, sum( (h[i+1,i] - (ζ[i+1] + η[i+1] + 1) )^2 for i in 0:N-1) )

@objective(mod1, Min, term_1 + term_2 + term_3)

optimize!(mod1)

if termination_status(mod1) != OPTIMAL
@error "termination status is not optimal"
end

obj_val = objective_value(mod1)

@info "fitting error for N=$(N) to construct ζ, η from h =$(obj_val)"

ζ_star = value.(ζ)

η_star = value.(η)

return ζ_star, η_star

end

Let us now test this function.

ζ_star, η_star = construct_ζ_η_from_h(N, L, h_star; fix_ζ_η = :false)

# output: [ Info: fitting error for N=5 to construct ζ, η from h = -6.039613253960852e-14

@show ζ_star

# output: ζ_star = [6.352740751660148e-16, 0.2817535522830343, 0.43404278553274767, 0.5310638059557159, 0.442479185968013]

@show η_star

# output: η_star = [0.6180340040827577, 0.7376402784557785, 0.7977067372415494, 0.8345650242736313, 0.6352906826136708]

Let us compare now if the original $$\zeta, \eta$$ from OGM matches this reconstructed.

## Compare with original ζ, η coming from OGM

@show "difference between ζ_OGM and ζ_reconstructed is: $(norm(ζ_star - ζ_OGM))" # output: # "difference between ζ_OGM and ζ_reconstructed is:$(norm(ζ_star - ζ_OGM))" = "difference between ζ_OGM and ζ_reconstructed is: 2.7302642325329647e-8"

@show "difference between ζ_OGM and ζ_reconstructed is: $(norm(η_star - η_OGM))" # output: # "difference between ζ_OGM and ζ_reconstructed is:$(norm(η_star - η_OGM))" = "difference between ζ_OGM and ζ_reconstructed is: 3.0971070252469905e-8"

## Converting between auxiliary-format form and momentum-form

Recall that the momentum-form is written as:

\begin{align*} \begin{array}{ll} y_{i+1} & =x_{i}-\frac{1}{L}\nabla f(x_{i}), \quad (i)\\ x_{i+1} & =y_{i+1}+\zeta_{i+1}(y_{i+1}-y_{i})+\eta_{i+1}(y_{i+1}-x_{i}) \quad (ii), \end{array}\quad(\textrm{MomentumForm)} \end{align*}

Another common format that usually appears in proofs is the so-called auxiliary-format form:

\begin{align*} \begin{array}{ll} x_{i}=(1-\delta_{i})y_{i}+\delta_{i}z_{i},\quad(i)\\ y_{i+1}=x_{i}-\frac{1}{L}\nabla f(x_{i}),\quad(ii) \\ z_{i+1}=z_{i}+\gamma_{i}(y_{i+1}-x_{i}).\quad(iii) \end{array}\quad(\textrm{AuxForm)}\end{align*}

Now, the iterate $$x_{i+1}$$ would be:

\begin{align*} x_{i+1} & =(1-\delta_{i+1})y_{i+1}+\delta_{i+1}z_{i+1} \\ & =(1-\delta_{i+1})y_{i+1}+\delta_{i+1}\left(z_{i}+\gamma_{i}(y_{i+1}-x_{i})\right)\\ & =(1-\delta_{i+1})y_{i+1}+\delta_{i+1}\left(\frac{1}{\delta_{i}}x_{i}+\left(1-\frac{1}{\delta_{i}}\right)y_{i}+\gamma_{i}(y_{i+1}-x_{i})\right)\\ & =y_{i+1}+\left(\frac{\delta_{i+1}}{\delta_{i}}-\delta_{i+1}\right)\left(y_{i+1}-y_{i}\right)+\left(\gamma_{i}\delta_{i+1}-\frac{\delta_{i+1}}{\delta_{i}}\right)\left(y_{i+1}-x_{i}\right), \quad (\textrm{SimpX}) \end{align*}

where in the third line we have used the equation

$z_{i}=\frac{1}{\delta_{i}}x_{i}+\left(1-\frac{1}{\delta_{i}}\right)y_{i}.$

that comes from (AuxForm)(i). Comparing (SimpX) with (MomentumForm):

\begin{align*} \zeta_{i+1} & =\delta_{i+1}\left(\frac{1}{\delta_{i}}-1\right),\\ \eta_{i+1} & =\delta_{i+1}\left(\gamma_{i}-\frac{1}{\delta_{i}}\right). \end{align*}

Given $$\zeta, \eta$$ we can compute $$\delta, \gamma$$ as follows. Define: $$a_i = 1/\delta_i$$, then solve the linear system of equations with variables $$a_i, \gamma_i$$:

\begin{align*} & \zeta_{i+1}a_{i+i}=a_{i}-1, \\ & \eta_{i+1}a_{i+1}=\gamma_{i}-a_{i}. \end{align*}

Then we have $$\gamma_i$$ we compute $$\delta_i$$ by using $$\delta_i = 1/a_i$$.

Mathematica code.

CollectWRTVarList[expr_, vars_List] :=
Expand[Simplify[
expr /. Flatten[
Solve[# == ToString@#, First@Variables@#] & /@ vars]],
Alternatives @@ ToString /@ vars] /.
term1 = CollectWRTVarList[
Subscript[y,
1 + i] (1 - Subscript[Delta], 1 + i]) + ((-Subscript[x, i] + Subscript[y, 1 + i]) Subscript[\[Gamma], i] + Subscript[y, i] (1 - Subscript[inv\[Delta], i]) + Subscript[x, i] Subscript[inv\[Delta], i]) Subscript[\[Delta], 1 + i], {Subscript[y, 1 + i] - Subscript[y, i], Subscript[y, 1 + i] - Subscript[x, i]}] /. {(-Subscript[x, i] + Subscript[y, 1 + i]) -> fact1, (-Subscript[y, i] + Subscript[y, 1 + i]) -> fact2}; termFinal = Collect[term1, {fact1, fact2}] /. {fact1 -> (-Subscript[x, i] + Subscript[y, 1 + i]), fact2 -> (-Subscript[y, i] + Subscript[y, 1 + i]), Subscript[inv\[Delta], i] -> (1/Subscript[\[Delta], i]), Linv -> (1/L)} ## Example 1. FISTA FISTA in momentum format is \[ \begin{aligned} & x_{0}=y_{0},\quad(i)\\ & y_{i+1}=x_{i}-\frac{1}{L}\nabla f(x_{i})-\frac{1}{L}h^{\prime}(y_{i+1}),\quad i\in[0:N-1],\quad(ii)\\ & x_{i+1}=y_{i+1}+\zeta_{i+1}(y_{i+1}-y_{i})+\eta_{i+1}(y_{i+1}-x_{i}),\quad i\in[0:N-1].\quad(iii) \end{aligned}

FISTA in auxiliary iterate format:

$\begin{eqnarray*} & & z_{0}=x_{0}=y_{0},\quad(i)\\ & & x_{i}=(1-\delta_{i})y_{i}+\delta_{i}z_{i},\quad(ii)\\ & & y_{i+1}=x_{i}-\frac{1}{L}\nabla f(x_{i})-\frac{1}{L}h^{\prime}(y_{i+1}),\quad(iii)\\ & & z_{i+1}=z_{i}+\gamma_{i}(y_{i+1}-x_{i}).\quad(iv) \end{eqnarray*}$

From auxiliary format (ii) we have

$z_{i}=\frac{1}{\delta_{i}}x_{i}+\left(1-\frac{1}{\delta_{i}}\right)y_{i}$

At the next iteration we have:

\begin{align*} x_{i+1} & =(1-\delta_{i+1})y_{i+1}+\delta_{i+1}z_{i+1}\\ & =(1-\delta_{i+1})y_{i+1}+\delta_{i+1}\left(z_{i}+\gamma_{i}(y_{i+1}-x_{i})\right)\\ & =(1-\delta_{i+1})y_{i+1}+\delta_{i+1}\left(\frac{1}{\delta_{i}}x_{i}+\left(1-\frac{1}{\delta_{i}}\right)y_{i}+\gamma_{i}(y_{i+1}-x_{i})\right)\\ & =y_{i+1}+\left(\frac{\delta_{i+1}}{\delta_{i}}-\delta_{i+1}\right)\left(y_{i+1}-y_{i}\right)+\left(\gamma_{i}\delta_{i+1}-\frac{\delta_{i+1}}{\delta_{i}}\right)\left(y_{i+1}-x_{i}\right). \end{align*}

By pattern-matching we have the follwoing relationship:

\begin{align*} \zeta_{i+1} & =\delta_{i+1}\left(\frac{1}{\delta_{i}}-1\right),\\ \eta_{i+1} & =\delta_{i+1}\left(\gamma_{i}-\frac{1}{\delta_{i}}\right). \end{align*}

## Example 2. OGM

As our example, we consider the Optimized Gradient Method (OGM) due to Kim and Fessler. For a $$L$$-smooth convex function $$f$$, $$x_{0}\in\mathbf{R}^{d},\theta_{0}=1,$$ the algorithm is defined in its auxiliary form as

$\begin{array}{ll} y_{i+1} & =x_{i}-\frac{1}{L}\nabla f(x_{i})\\ z_{i+1} & =z_{i}-\frac{2\theta_{i}}{L}\nabla f(x_{i})\\ x_{i+1} & =\left(1-\frac{1}{\theta_{i+1}}\right)y_{i+1}+\frac{1}{\theta_{i+1}}z_{i+1}, \end{array}\quad(\textrm{OGM)}$

​ where $$i\in\{0,1,\ldots,N-1\}$$​.

First, note that in (OGM), gradient is evaluated at $$x_{i}$$​ iterates, so we will try to remove $$z_{i}$$​ iterates from (OGM), and write the last iterate as terms involving $$y_{i+1},y_{i},$$​ and $$x_{i}$$​. To that goal, we will write, $$z_{i+1}$$​ completely using $$y_{i},x_{i},$$​ and $$y_{i+1}$$​. From, the first iteration of (OGM),

$\frac{1}{L}\nabla f(x_{i})=x_{i}-y_{i+1},\quad(1)$

​ and putting (1) in the second iterate of (OGM), we have

\begin{align*} z_{i+1} & =z_{i}-\frac{2\theta_{i}}{L}\nabla f(x_{i})\\ & =z_{i}-(2\theta_{i})(x_{i}-y_{i+1})\\ & =z_{i}-2\theta_{i}x_{i}+2\theta_{i}y_{i+1}\quad(2).\end{align*}

​ The third iterate of (OGM) for index $$i$$​ will give:

ClearAll["Global*"];

Solve[y[i] (1 - 1/Theta][i]) + z[i]/\[Theta][i] == x[i], z[i]] (*Out[] = {{z[i]\[Rule]y[i]+x[i] \[Theta][i]-y[i] \[Theta][i]}}*) Collect[y[i] + x[i] \[Theta][i] - y[i] \[Theta][i], {x[i], y[i]}] (*Out[] = y[i] (1-\[Theta][i])+x[i] \[Theta][i]*) \[\begin{align*} x_{i} & =\left(1-\frac{1}{\theta_{i}}\right)y_{i}+\frac{1}{\theta_{i}}z_{i}\\ \Leftrightarrow z_{i} & =(1-\theta_{i})y_{i}+\theta_{i}x_{i},\end{align*}

​ and putting that in (2), we get:

z[i] - 2 Theta][i] x[i] + 2 \[Theta][i] y[i + 1] /. z[i] -> y[i] (1 - \[Theta][i]) + x[i] \[Theta][i] (*Out[] = y[i] (1-\[Theta][i])-x[i] \[Theta][i]+2 y[1+i] \[Theta][i]*) \[\begin{align*} z_{i+1} & =z_{i}-2\theta_{i}x_{i}+2\theta_{i}y_{i+1}\\ & =(1-\theta_{i})y_{i}-\theta_{i}x_{i}+2\theta_{i}y_{i+1}\quad(3),\end{align*}

​ and putting (3) into the third iterate of (OGM), we get

(*This code will collect terms with a specific patterns*)
(*Caution all the terms have to be scalrs, does not work with
table term such x[i] etc, but works with xi and so on*)
CollectWRTVarList[expr_, vars_List] :=
Expand[Simplify[
expr /. Flatten[
Solve[# == ToString@#, First@Variables@#] & /@ vars]],
Alternatives @@ ToString /@ vars] /.

term = (1 - 1/Theta][i + 1]) y[i + 1] + 1/\[Theta][i + 1] z[i + 1] /. z[i + 1] -> y[i] (1 - \[Theta][i]) - x[i] \[Theta][i] + 2 y[1 + i] \[Theta][i] // Simplify; (*Out[] = (-y[i] (-1+\[Theta][i])-x[i] \[Theta][i]+y[1+i] (-1+2 \ \[Theta][i]+\[Theta][1+i]))/\[Theta][1+i]*) CollectWRTVarList[ term, {y[i + 1] - y[i], {y[i + 1] - x[i]}}] /. {-y[i] + y[1 + i] -> t1, -x[i] + y[1 + i] -> t2 } (*Out[] = {y[1+i]-t1/\[Theta][1+i]+(t1 \[Theta][i])/\[Theta][1+i]+(t2 \ \[Theta][i])/\[Theta][1+i]}*) Collect[y[1 + i] - t1/\[Theta][1 + i] + ( t1 \[Theta][i])/\[Theta][1 + i] + ( t2 \[Theta][i])/\[Theta][1 + i], {t1, t2}, Simplify] /. {t1 -> -y[i] + y[1 + i], t2 -> -x[i] + y[1 + i]} (*Out[] = y[1+i]+((-y[i]+y[1+i]) \ (-1+\[Theta][i]))/\[Theta][1+i]+((-x[i]+y[1+i]) \ \[Theta][i])/\[Theta][1+i]*) \[\begin{align*} x_{i+1} & =\left(1-\frac{1}{\theta_{i+1}}\right)y_{i+1}+\frac{1}{\theta_{i+1}}z_{i+1}\\ = & y_{i+1}+\frac{\theta_{i}-1}{\theta_{i+1}}(y_{i+1}-y_{i})+\frac{\theta_{i}}{\theta_{i+1}}(y_{i+1}-x_{i}).\end{align*}

So, we have the following "momentum form" of (OGM):

$\begin{array}{ll} y_{i+1} & =x_{i}-\frac{1}{L}\nabla f(x_{i})\\ x_{i+1} & =y_{i+1}+\frac{\theta_{i}-1}{\theta_{i+1}}(y_{i+1}-y_{i})+\frac{\theta_{i}}{\theta_{i+1}}(y_{i+1}-x_{i}), \end{array}\quad(\textrm{MomentumOGM)}$
​ which we show to be equivalent to (FSFOM). To show that (MomentumOGM) is in the form (FSFOM), we put, the iterative form $$y_{+1}$$ and $$y_{i}$$ in terms of the $$x$$ iterates in the second iterate. For simplification purpose, denote $$g_{i}=\nabla f(x_{i})$$. We get:

Subscript[y, 1 + i] = Subscript[x, i] - Subscript[g, i]/L;

Subscript[y, i] = Subscript[x, i - 1] - Subscript[g, i - 1]/L;

Subscript[x, i + 1] =
Subscript[y,
i + 1] + ((Subscript[$Theta], i] - 1) (Subscript[y, i + 1] - Subscript[y, i]))/Subscript[\[Theta], i + 1] + ( Subscript[\[Theta], i] (Subscript[y, i + 1] - Subscript[x, i]))/ Subscript[\[Theta], i + 1]; Collect[Subscript[x, i + 1], {Subscript[x, i], Subscript[x, i] - Subscript[x, i - 1], Subscript[g, i]}, Simplify] (*Output[] = Subscript[x, i]+(Subscript[g, -1+i] (-1+Subscript[\ \[Theta], i]))/(L Subscript[\[Theta], 1+i])+((-Subscript[x, \ -1+i]+Subscript[x, i]) (-1+Subscript[\[Theta], \ i]))/Subscript[\[Theta], 1+i]-(Subscript[g, i] (-1+2 Subscript[\ \[Theta], i]+Subscript[\[Theta], 1+i]))/(L Subscript[\[Theta], 1+i])*) \[x_{i+1}=x_{i}+\frac{\left(\theta_{i}-1\right)\left(x_{i}-x_{i-1}\right)}{\theta_{i+1}}+\frac{g_{i-1}\left(\theta_{i}-1\right)}{L\theta_{i+1}}-\frac{g_{i}\left(2\theta_{i}+\theta_{i+1}-1\right)}{L\theta_{i+1}}\quad(1)$

Now from (FSFOM):

\begin{align*} x_{i} & =x_{0}-\sum_{j=0}^{i-1}\frac{h_{i,j}}{L}g_{j},\\ x_{i-1} & =x_{0}-\sum_{j=0}^{i-2}\frac{h_{i-1,j}}{L}g_{j},\end{align*}

​ which gives

\begin{align*} x_{i}-x_{i-1} & =-\sum_{j=0}^{i-2}\frac{(-h_{i-1,j}+h_{i,j})}{L}g_{j}-\frac{h_{i,i-1}}{L}g_{i-1}\quad(2)\end{align*}

​​ and putting this in (1) and then simplifying we get:

(*x[i+1]=*)
term1 = x[i] + (g[-1 + i] (-1 + Theta][i]))/( L \[Theta][ 1 + i]) + ((-x[-1 + i] + x[i]) (-1 + \[Theta][i]))/\[Theta][ 1 + i] - (g[i] (-1 + 2 \[Theta][i] + \[Theta][1 + i]))/( L \[Theta][1 + i]); (*x[i]-x[i-1]=*) termDif = (-h[i, i - 1]/L g[i - 1] - \!$$\*UnderoverscriptBox[\(\[Sum]$$, $$j = 0$$, $$i - 2$$]$$\*FractionBox[\((\(-h[i - 1, j]$$ + h[i, j])\), $$L$$] g[j]\)\)); term2 = term1 /. (-x[-1 + i] + x[i]) -> termDif; term3 = Collect[term2, {g[i], g[i - 1]}, Simplify]; term4 = term3 /. {h_[a_] -> Subscript[h, a], h[i, j] -> Subscript[h, i, j], h[-1 + i, j] -> Subscript[h, i - 1, j], h[i, -1 + i] -> Subscript[h, i, i - 1]} (*term4=Subscript[x, i]-(Subscript[g, i] (-1+2 Subscript[\[Theta], \ i]+Subscript[\[Theta], 1+i]))/(L Subscript[\[Theta], \ 1+i])-(Subscript[g, -1+i] (-1+Subscript[\[Theta], i]) \ (-1+Subscript[h, i,-1+i]))/(L Subscript[\[Theta], \ 1+i])-((-1+Subscript[\[Theta], i]) \!$$\*UnderoverscriptBox[\(\[Sum]$$, $$j = 0$$, $$\(-2$$ + i\)] \*FractionBox[$$\*SubscriptBox[\(g$$, $$j$$]\ $$(\(- \*SubscriptBox[\(h$$, $$\(-1$$ + i, j\)]\) + \*SubscriptBox[$$h$$, $$i, j$$])\)\), $$L$$]\))/Subscript[\[Theta], 1+i]*)` \[\begin{align*} x_{i+1} & =x_{i}-\frac{\left(\theta_{i}-1\right)}{\theta_{i+1}}\sum_{j=0}^{i-2}\frac{\left(h_{i,j}-h_{i-1,j}\right)}{L}g_{j}\\ & -\frac{\left(\theta_{i}-1\right)\left(h_{i,i-1}-1\right)}{L\theta_{i+1}}g_{i-1}-\frac{\left(2\theta_{i}+\theta_{i+1}-1\right)}{L\theta_{i+1}}g_{i}.\quad(\textrm{FSFOM-OGM})\end{align*}

​ Recall that, using (2) any (FSFOM) satisfying sequence will obey:

$x_{i+1}=x_{i}-\sum_{j=0}^{i-1}\frac{(h_{i+1,j}-h_{i,j})}{L}g_{j}-\frac{h_{i+1,i}}{L}g_{i}\quad(3)$

​ Note that (3) and (FSFOM-OGM) are in the same format now for a pattern matching. Comparing the terms part by part, we get the following recursive system:

\begin{align*} \forall_{j\in[0:i-2]}\quad h_{i+1,j} & -h_{i,j}=\frac{\left(\theta_{i}-1\right)}{\theta_{i+1}}\left(h_{i,j}-h_{i-1,j}\right)\\ h_{i+1,i-1}-h_{i,i-1} & =\frac{\left(\theta_{i}-1\right)\left(h_{i,i-1}-1\right)}{\theta_{i+1}}\\ h_{i+1,i} & =\frac{\left(2\theta_{i}+\theta_{i+1}-1\right)}{\theta_{i+1}},\end{align*}

​ with initial condition $$h_{1,j}=0$$​ for $$j<0$$​ and $$h_{0,j}=0$$​ for all $$j$$​.