Constructing an interpolated function for the class of smooth strongly convex functions

Shuvomoy Das Gupta

December 1, 2021

In this blog, we study constructing an interpolated smooth and strongly convex function from a set of points due to Yoel Drori and Adrien Taylor from[4].


Table of contents


Notation and notions.

All norms are 2-norm in this blog. A function f:RdRf:\mathbf{R}^{d}\to\mathbf{R} is LL-smooth convex if and only

(x,yRd)f(y)f(x)+f(x)yx+12Lf(x)f(y)2.\left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}.

On the other hand, a function is μ\mu​-strongly convex if and only if

(x,yRd)f(y)f(x)+f(x)yx+μ2xy2(SCVX)\left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{\mu}{2}\|x-y\|^{2}\qquad (\text{SCVX})

​ where f()f^{\prime}(\cdot)​ denotes a subgradient of ff​ at ()(\cdot)​.

On Rd,\mathbf{R}^{d}, the set of all LL-smooth convex functions is denoted by F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d}), the set of all μ\mu-strongly convex functions is denoted by Fμ,(Rd)\mathcal{F}_{\mu,\infty}(\mathbf{R}^{d}), and the set of all μ\mu-strongly convex and LL-smooth functions is denoted by Fμ,L(Rd)\mathcal{F}_{\mu,L}(\mathbf{R}^{d}). Finally the set of all lower-semicontinuous, proper, and convex functions is denoted by F0,(Rd)\mathcal{F}_{0,\infty}(\mathbf{R}^{d}).

Helper results.

Alternative characterization of smooth convex functions

An alternative characterization of fF0,L(Rd)f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}), that we will use multiple times, is as follows.

Theorem 1: Alternative characterization of smooth convex functions[1, Definition 2.6, Theorem 2.27, Theorem 2.28]
Theorem

For a function f:RdRf:\mathbf{R}^{d}\to\mathbf{R}​ the following statements are equivalent:

(a) fF0,L(Rd)f\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​.

(b)ff​ is convex and it satisfies (x,yRd)f(y)f(x)+f(x)yx+L2xy2.\left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\leq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{L}{2}\|x-y\|^{2}.

(c) ff is convex and satisfies (x,yRd)f(x)f(y)Lxy\left(\forall x,y\in\mathbf{R}^{d}\right)\quad\|\nabla f(x)-\nabla f(y)\|\leq L\|x-y\|.

(d) L22fF0,(Rd)\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,\infty}(\mathbf{R}^{d})​​.

In (b)(b) and (c)(c) of the theorem above, saying that ff is convex is necessary, as the inequalities only in (b),(c)(b),(c) are not sufficient to establish LL-smooth convexity.

We now record the notion of an interpolable function.

Definition 1: Interpolable function.
Definition

Suppose we are given the set of triplets {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} where II is a finite index set. Let F(Rd)\mathcal{F}(\mathbf{R}^d) be some class of functions. Then the set {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i} ) \}_{i\in I} is F(Rd)\mathcal{F}(\mathbf{R}^{d})-interpolable if and only if there exists a function fF(Rd)f\in\mathcal{F}(\mathbf{R}^{d}) such that for all iIi\in I we have fi=f(xi)f_{i}=f(x_{i}) and gif(xi)g_{i} \in \partial f(x_{i}).

Equivalence between smooth strongly convex and smooth convex function class

To establish our main result, we will use the following equivalence result.

Lemma 1: Equivalence between function classes
Lemma

We have fFμ,L(Rd)L22fF0,Lμ(Rd)f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})\Leftrightarrow\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}).

Proof.

First, we prove fFμ,L(Rd)L22fF0,Lμ(Rd)f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})\Rightarrow\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d})​​. Define hL22fh\coloneqq\frac{L}{2}\|\cdot\|^{2}-f​​, so f=L22hf=\frac{L}{2}\|\cdot\|^{2}-h​​ and f(x)=Lxh(x)\nabla f(x)=Lx-\nabla h(x)​​. Clearly, fF0,L(Rd)f\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​​ also, so due to Theorem 1(a),(d),(a),(d),​​we have hF0,(Rd)h\in\mathcal{F}_{0,\infty}(\mathbf{R}^{d})​​. Because ff​​ is μ\mu​​-strongly convex and differentiable, for any x,yRdx,y\in\mathbf{R}^{d}​​ we have from (SCVX)(\text{SCVX}):

f(y)f(x)+f(x)yx+μ2xy2L2y2h(y)L2x2h(x)+Lxh(x)yx+μ2xy2L2y2+h(y)L2x2+h(x)Lxh(x)yxμ2xy2h(y)L2y2L2x2+h(x)Lxyx+h(x)yxμ2xy2=L2y2L2x2+h(x)Lxy+Lx2+h(x)yxμ2xy2=L2(y2+x22xy)+h(x)+h(x)yxμ2xy2=h(x)+h(x)yx+L2xy2μ2xy2=h(x)+h(x)yx+Lμ2xy2.\begin{aligned} & f(y)\geq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & \frac{L}{2}\|y\|^{2}-h(y)\geq\frac{L}{2}\|x\|^{2}-h(x)+\left\langle Lx-\nabla h(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & -\frac{L}{2}\|y\|^{2}+h(y)\leq-\frac{L}{2}\|x\|^{2}+h(x)-\left\langle Lx-\nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & h(y)\leq\frac{L}{2}\|y\|^{2}-\frac{L}{2}\|x\|^{2}+h(x)-L\left\langle x\mid y-x\right\rangle +\left\langle \nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=\frac{L}{2}\|y\|^{2}-\frac{L}{2}\|x\|^{2}+h(x)-L\left\langle x\mid y\right\rangle +L\|x\|^{2}+\left\langle \nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=\frac{L}{2}\left(\|y\|^{2}+\|x\|^{2}-2\left\langle x\mid y\right\rangle \right)+h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L}{2}\|x-y\|^{2}-\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}.\end{aligned}

So we have proven that hh is convex and for any x,yRd,x,y\in\mathbf{R}^{d},

h(y)h(x)+h(x)yx+Lμ2xy2h(y)\leq h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}

which this is equivalent to saying that hF0,Lμ(Rd)h\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) due to Theorem 1(a),(b)(a),(b).

Next, we prove hL22fF0,Lμ(Rd)fFμ,L(Rd)h\coloneqq\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d})\Rightarrow f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}). Due to Theorem 1(a),(b)(a),(b), we have: hh convex and for any x,yRd,x,y\in\mathbf{R}^{d},

h(y)h(x)+h(x)yx+Lμ2xy2L2y2f(y)L2x2f(x)+Lxf(x)yx+Lμ2xy2L2y2+f(y)L2x2+f(x)Lxf(x)yxLμ2xy2f(y)f(x)+f(x)yx+L2y2L2x2Lxy+Lx2Lμ2xy2=f(x)+f(x)yx+L2(y2+x22xy)undefined=xy2Lμ2xy2=f(x)+f(x)yx+μ2xy2,\begin{aligned} & h(y)\leq h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & \frac{L}{2}\|y\|^{2}-f(y)\leq\frac{L}{2}\|x\|^{2}-f(x)+\left\langle Lx-\nabla f(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & -\frac{L}{2}\|y\|^{2}+f(y)\geq-\frac{L}{2}\|x\|^{2}+f(x)-\left\langle Lx-\nabla f(x)\mid y-x\right\rangle -\frac{L-\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & f(y)\geq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{L}{2}\|y\|^{2}-\frac{L}{2}\|x\|^{2}-L\left\langle x\mid y\right\rangle +L\|x\|^{2}-\frac{L-\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{L}{2}\underbrace{\left(\|y\|^{2}+\|x\|^{2}-2\left\langle x\mid y\right\rangle \right)}_{=\|x-y\|^{2}}-\frac{L-\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2},\end{aligned}

i.e., we have shown that for all x,yRdx,y\in\mathbf{R}^{d}​​​

f(y)f(x)+f(x)yx+μ2xy2,f(y)\geq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2},

​​​ hence by defintion ff​​​ is μ\mu​​​-strongly convex. Finally, we have

f(x)f(y)=Lxh(x)Ly+h(y)=L(xy)+(h(y)h(x))a)Lxy+h(y)h(x)b)Lxy+(Lμ)xy=Lxy,\begin{aligned} \|\nabla f(x)-\nabla f(y)\| & =\|Lx-\nabla h(x)-Ly+\nabla h(y)\|\\ & =\|L(x-y)+(\nabla h(y)-\nabla h(x))\|\\ & \overset{a)}{\leq}L\|x-y\|+\|\nabla h(y)-\nabla h(x)\|\\ & \overset{b)}{\leq}L\|x-y\|+(L-\mu)\|x-y\|\\ & =L\|x-y\|,\end{aligned}

where a)a) follows from triangle inequallity and b)b) follows from Theorem 1(a),(c)(a),(c). So, ff is LL-smooth besides being μ\mu-strongly convex, i.e., fFμ,L(Rd)f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}). This completes the proof. 

Interpolation equivalence for smooth strongly convex and smooth convex functions

Also, we are going to use the following interpolation result.

Theorem 2: Interpolation equivalence for smooth strongly convex and smooth convex functions
Theorem

Suppose we are given the set of triplets set of triplets {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} where II​ is a finite index set. Then the following are equivalent.

(a) {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​ is Fμ,L(Rd)\mathcal{F}_{\mu,L}(\mathbf{R}^{d})​​-interpolable.

(b) {(xundefinedi,gundefinedi,fundefinedi)}iI{(xi,Lxigi,L2xi2fi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\coloneqq\{(x_{i},Lx_{i}-g_{i},\frac{L}{2}\|x_{i}\|^{2}-f_{i})\}_{i\in I}​ is F0,Lμ(Rd)\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d})​​-interpolable.

Proof.

First, we prove (a)(b).(a)\Rightarrow(b).​ If (a)(a)​ holds, then by definition there exists a function fFμ,L(Rd)f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})​ that satisfies for all iIi\in I: f(xi)=fif(x_{i})=f_{i}​ and f(xi)=gi\nabla f(x_{i})=g_{i}​. If we define, h=(L/2)2f,h=(L/2)\|\cdot\|^{2}-f,​ then hF0,Lμ(Rd)h\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d})​ due to Lemma 1, and for all iIi\in I​ we have h(xi)=L2xi2f(xi)=L2xi2fih(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-f(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-f_{i}​ and h(xi)=Lxif(xi)=Lxigi\nabla h(x_{i})=Lx_{i}-\nabla f(x_{i})=Lx_{i}-g_{i}​. This proves (b)(b)​.

Next, we prove (b)(a)(b)\Rightarrow(a)​​. If (b)(b)​​ holds, there exists a function hF0,Lμ(Rd)h\in\mathcal{F}_{0,L-\mu }(\mathbf{R}^{d})​​ that satisfies for all iIi\in I: h(xi)=L2xi2fih(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-f_{i}​​ and h(xi)=Lxigi\nabla h(x_{i})=Lx_{i}-g_{i}​​. If we define, f=(L/2)2hf=(L/2)\|\cdot\|^{2}-h​​, then fFμ,L(Rd)f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})​​ due to Lemma 1, and for all iIi\in I​​ we have f(xi)=L2xi2h(xi)=L2xi2(L2xi2fi)=fif(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-h(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-(\frac{L}{2}\|x_{i}\|^{2}-f_{i})=f_{i}​​ and f(xi)=Lxih(xi)=Lxi(Lxigi)=gi.\nabla f(x_{i})=Lx_{i}-\nabla h(x_{i})=Lx_{i}-(Lx_{i}-g_{i})=g_{i}.​​​ This proves (a). 

Drori's interpolated function for smooth convex function class

Finally, we present the following interpolation result due to Yoel Drori from[3]. I showed a detailed proof of this result in the previous blog post: here.

Theorem 3: Interpolation of smooth convex functions.
Theorem

If {(xundefinedi,gundefinedi,fundefinedi)}iIRd×Rd×R\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​ is F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​-interpolable with L>0L>0​, then one interpolated function fundefinedF0,L(Rd)\widetilde{f}\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​ that interpolates {(xundefinedi,gundefinedi,fundefinedi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}​ is:

fundefined(y)=minαΔ[L2yiIαi(xundefinedi1Lgundefinedi)2+iIαi(fundefinedi12Lgundefinedi2)],\widetilde{f}(y)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2L}\|\widetilde{g}_{i}\|^{2}\right)\right],

​ where Δ={βRIβ0,iIβi=1}.\Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}.

Main result.

Now we are in a position to state our main result due to Yoel Drori and Adrien Taylor from[4, Theorem 1] followed by its proof. Their proof is direct and the proof we present here is based on interpolation argument.

Theorem 4: Interpolation of smooth strongly convex functions
Theorem

If {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​ is Fμ,L(Rd)\mathcal{F}_{\mu,L}(\mathbf{R}^{d})​-interpolable with L>0L>0​, then one interpolation function fFμ,L(Rd)f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})​ that interpolates {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​ is:

f(y):=maxαΔ[L2y2Lμ2y1Lμiαi(giμxi)2+iIαi(fi+12(Lμ)giLxi2L2xi2)],\begin{align*} f(y) & :=\max_{\alpha\in\Delta}\Big[\frac{L}{2}\|y\|^{2}-\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}\\ & \quad +\sum_{i\in I}\alpha_{i}(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2})\Big], \end{align*}

where Δ={βRIβ0,iIβi=1}.\Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}.

Proof. The proof sketch is as follows comprising two steps.

  1. First, we find an interpolated function fundefinedF0,Lμ(Rd)\widetilde{f}\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) that interpolates {(xundefinedi,gundefinedi,fundefinedi)}iI{(xi,Lxigi,L2xi2fi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\coloneqq\{(x_{i},Lx_{i}-g_{i},\frac{L}{2}\|x_{i}\|^{2}-f_{i})\}_{i\in I} using Theorem 3.

  1. Then due to the proof of Theorem 2, the function f=(L/2)2fundefinedf=(L/2)\|\cdot\|^{2}-\widetilde{f} will be in Fμ,L(Rd)\mathcal{F}_{\mu,L}(\mathbf{R}^{d}) and will interpolate {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}.

(1) From Theorem 3 recall that, if {(xundefinedi,gundefinedi,fundefinedi)}iIRd×Rd×R\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​ is F0,Lμ(Rd)\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d})​-interpolable, then one interpolation function fundefinedF0,L(Rd)\widetilde{f}\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​ that interpolates {(xundefinedi,gundefinedi,fundefinedi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}​ is:

fundefined(y)=minαΔ[Lμ2yiIαi(xundefinedi1Lμgundefinedi)2+iIαi(fundefinedi12(Lμ)gundefinedi2)],\widetilde{f}(y)=\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L-\mu}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2(L-\mu)}\|\widetilde{g}_{i}\|^{2}\right)\right],

where Δ={βRIβ0,iIβi=1}.\Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}.​​ In our setup, we have {(xundefinedi,gundefinedi,fundefinedi)}iI{(xi,Lxigi,L2xi2fi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\coloneqq\{(x_{i},Lx_{i}-g_{i},\frac{L}{2}\|x_{i}\|^{2}-f_{i})\}_{i\in I}​​, so lets put that in the last equation:

fundefined(y)=minαΔ[Lμ2yiIαi(xi1Lμ(Lxigi)undefined=xi(1LLμ)+1Lμgi=μLμxi+1Lμgi=1Lμ(giμxi))2+iIαi(L2xi2fi12(Lμ)Lxigi2)]=minαΔ[Lμ2y1LμiIαi(giμxi)2iIαi(fi+12(Lμ)giLxi2L2xi2)].\begin{aligned} \widetilde{f}(y) & =\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\sum_{i\in I}\alpha_{i}\Big(\underbrace{x_{i}-\frac{1}{L-\mu}(Lx_{i}-g_{i})}_{=x_{i}(1-\frac{L}{L-\mu})+\frac{1}{L-\mu}g_{i}=-\frac{\mu}{L-\mu}x_{i}+\frac{1}{L-\mu}g_{i}=\frac{1}{L-\mu}(g_{i}-\mu x_{i})}\Big)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\frac{L}{2}\|x_{i}\|^{2}-f_{i}-\frac{1}{2(L-\mu)}\|Lx_{i}-g_{i}\|^{2}\right)\right]\\ & =\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}-\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right].\end{aligned}

(2) Hence, f=(L/2)2fundefinedf=(L/2)\|\cdot\|^{2}-\widetilde{f}​​​, which will be in Fμ,L(Rd)\mathcal{F}_{\mu,L}(\mathbf{R}^{d})​​​ and will interpolate {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​​ has the following form:

f(y)=(L/2)y2fundefined(y)=(L/2)y2minαΔ[Lμ2y1LμiIαi(giμxi)2iIαi(fi+12(Lμ)giLxi2L2xi2)]=a)(L/2)y2+maxαΔ[Lμ2y1LμiIαi(giμxi)2+iIαi(fi+12(Lμ)giLxi2L2xi2)]=maxαΔ[(L/2)y2Lμ2y1LμiIαi(giμxi)2+iIαi(fi+12(Lμ)giLxi2L2xi2)],\begin{aligned} f(y) & =(L/2)\|y\|^{2}-\widetilde{f}(y)\\ & =(L/2)\|y\|^{2}-\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}-\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right]\\ & \overset{a)}{=}(L/2)\|y\|^{2}+\max_{\alpha\in\Delta}\left[-\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right]\\ & =\max_{\alpha\in\Delta}\left[(L/2)\|y\|^{2}-\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right],\end{aligned}

where a)a)​​​ uses min()=max()\min(\cdot)=-\max(-\cdot)​​​. This completes the proof. ■

References.

[1] Taylor, Adrien B. Convex interpolation and performance estimation of first-order methods for convex optimization. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017.

[2] R Tyrell Rockafellar. Convex Analysis. Princeton University Press, 1996.

[3] Drori, Yoel. The exact information-based complexity of smooth convex minimization. Journal of Complexity 39 (2017): 1-16.

[4] Drori, Yoel and Taylor, Adrien, 2021. On the oracle complexity of smooth strongly convex minimization. arXiv preprint arXiv:2101.09740.