Properties of smooth nonconvex interpolated functions

Shuvomoy Das Gupta

November 24, 2021

In this blog, we study properties of ρ\rho-smooth nonconvex function that is interpolated from a set of points. This result is due to Yoel Drori and Ohad Shamir from [1, Theorem 7].


Table of contents


Notation and notions

All norms are 2-norm in this blog.

A differentiable function f:RdRf:\mathbf{R}^{d}\to\mathbf{R}​ is LL​-smooth convex if and only

(x,yRd)f(y)f(x)+f(x)yx+12Lf(x)f(y)2.\left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}.

A differentiable function f:RdRf:\mathbf{R}^{d}\to\mathbf{R}​ is ρ\rho​-smooth nonconvex if and only if

(x,yRd)ρ2xy2f(x)+f(x)yxf(y)ρ2xy2.\left(\forall x,y\in\mathbf{R}^{d}\right)\quad-\frac{\rho}{2}\|x-y\|^{2}\leq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle -f(y)\leq\frac{\rho}{2}\|x-y\|^{2}.

The set of all LL​-smooth convex functions on Rd\mathbf{R}^{d}​ is denoted by F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​ and the set of all ρ\rho​-smooth nonconvex functions is denoted by Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​.

A few helper results

We first record the following result from [3, Lemma 2.53].

Relationship between smooth convex and nonconvex functions

Lemma 1: Relation between smooth convex and nonconvex functions.
Lemma

For any f:RdRf:\mathbf{R}^{d}\to\mathbf{R}, we have

fFρ,ρ(Rd)f+ρ22F0,2ρ(Rd).f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})\Leftrightarrow f+\frac{\rho}{2}\|\cdot\|^{2}\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}).

Interpolable function

Next, we present the definition of an interpolable function.

Definition 1: Interpolable function.
Definition

Suppose we are given the set of triplets {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​ where II​​ is a finite index set and xiRd,giRd,x_{i}\in\mathbf{R}^{d},g_{i}\in\mathbf{R}^{d},​​ and fiR.f_{i}\in\mathbf{R}.​​ Let F(Rd)\mathcal{F}(\mathbf{R}^{d})​​ be a set of functions on Rd\mathbf{R}^{d}​​. Then the set {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​ is F(Rd)\mathcal{F}(\mathbf{R}^{d})​​-interpolable if and only if there exists a function fF(Rd)f\in\mathcal{F}(\mathbf{R}^{d})​​ such that for all iIi\in I​​ we have fi=f(xi)f_{i}=f(x_{i})​​ and gif(xi)g_{i} \in \partial f(x_{i})​​. We say that fF(Rd)f\in\mathcal{F}(\mathbf{R}^{d})​​ is an interpolated function that interpolates the set {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​.

Interpolation condition for smooth nonconvex functions

Next, we record the following result that follows from [1, Theorem 3.10] regarding interpolated functions on Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}). Note that, the published version of the paper [1] contains a typo in this theorem, which is corrected in a subsequent arxiv update available at https://arxiv.org/pdf/1512.07516.pdf.

Theorem 1: Interpolation condition for smooth nonconvex functions.
Theorem

Let ρ>0\rho>0​, and consider the set of triplets {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​, where II​ is a finite index set. Then {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​ is Fρ,ρ\mathcal{F}_{-\rho,\rho}​ interpolable if and only if for all i,jIi,j\in I​ we have

fifj+gjxixj+12ρgigj2ρ4(xixj)1ρ(gigj)2.f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\left\Vert (x_{i}-x_{j})-\frac{1}{\rho}(g_{i}-g_{j})\right\Vert ^{2}.

Proof.

Suppose, for all i,jIi,j\in I​​ we have

fifj+gjxixj+12ρgigj2ρ4(xixj)1ρ(gigj)2fifj+gjxixj+12ρgigj2ρ4(xixj2+1ρ2gigj221ρxixjgigj)fifj+gjxixj+12ρgigj2ρ4xixj2ρ41ρ2ρgigj2+2ρ4ρ2xixjgigjfifj+gjxixj+12ρgigj2ρ4xixj214ρgigj2+12gigjxixjfifj+gj+12gi12gjxixj+14ρgigj2ρ4xixj2fifj+12gj+12gixixj+14ρgigj2ρ4xixj2fifj+12gi+gjxixj+14ρgigj2ρ4xixj2.(1)\begin{aligned} & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\left\Vert (x_{i}-x_{j})-\frac{1}{\rho}(g_{i}-g_{j})\right\Vert ^{2}\\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\left(\|x_{i}-x_{j}\|^{2}+\frac{1}{\rho^{2}}\|g_{i}-g_{j}\|^{2}-2\frac{1}{\rho}\left\langle x_{i}-x_{j}\mid g_{i}-g_{j}\right\rangle \right)\\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}-\frac{\cancel{\rho}}{4}\frac{1}{\cancel{\rho^{2}}\rho}\|g_{i}-g_{j}\|^{2}+\frac{\cancel{2\rho}}{\cancel{4\rho}2}\left\langle x_{i}-x_{j}\mid g_{i}-g_{j}\right\rangle \\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}-\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}+\frac{1}{2}\left\langle g_{i}-g_{j}\mid x_{i}-x_{j}\right\rangle \\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}+\frac{1}{2}g_{i}-\frac{1}{2}g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}\\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle \frac{1}{2}g_{j}+\frac{1}{2}g_{i}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}\\ \Leftrightarrow & f_{i}\geq f_{j}+\frac{1}{2}\left\langle g_{i}+g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}.\quad\quad\quad(1)\end{aligned}

​​But from [1, Theorem 3.10], we have: {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​​ is Fρ,ρ\mathcal{F}_{-\rho,\rho}​​​ interpolable if and only if for all i,jIi,j\in I​​​ we have

fifj+12gi+gjxixj+14ρgigj2ρ4xixj2.f_{i}\geq f_{j}+\frac{1}{2}\left\langle g_{i}+g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}.

​​​ Using this along with (1) completes the proof. 

Minimal curvature addition and interpolation

Next, we have the following equivalence in interpolation conditions between functions in F0,2ρ(Rd)\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) and Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) via adding minimal curvature addition to the later function class.

Theorem 2: Minimal curvature addition and interpolation.
Theorem

Consider a set of triplets {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} and let ρ>0\rho>0. Then the following are equivalent.

(i) {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I} is Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})-interpolable.

(ii) {(xi,gi+ρxi,fi+ρ2xi2)}iI\{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I} is F0,2ρ(Rd)\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})-interpolable.

Proof.

The proof to (i)(ii)(i)\Rightarrow(ii): It follows from Lemma 1 that if there exists a function fFρ,ρ(Rd)f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​ interpolating the set {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​, then hf+(ρ/2)2h\coloneqq f+(\rho/2)\|\cdot\|^{2}​​ would satisfy hF0,2ρ(Rd)h\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})​​ and for all iIi\in I​​ we would have

h(xi)=f(xi)+ρ2xi2=fi+ρ2xi2,h(x_{i})=f(x_{i})+\frac{\rho}{2}\|x_{i}\|^{2}=f_{i}+\frac{\rho}{2}\|x_{i}\|^{2},

​​ and

h(xi)=f(xi)+ρxi=gi+ρxi.\nabla h(x_{i})=\nabla f(x_{i})+\rho x_{i}=g_{i}+\rho x_{i}.

​​ Hence, the set {(xi,gi+ρxi,fi+(ρ/2)xi2)}iI\{(x_{i},g_{i}+\rho x_{i},f_{i}+(\rho/2)\|x_{i}\|^{2})\}_{i\in I}​​ is interpolated by the function hF0,2ρ(Rd)h\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})​​.

The proof to (ii)(i)(ii)\Rightarrow(i): If {(xi,gi+ρxi,fi+ρ2xi2)}iI\{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I}​​ is F0,2ρ(Rd)\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})​​-interpolable, then there exists a hF0,2ρ(Rd)h\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})​​, such that for all iIi\in I​​ we would have

h(xi)=fi+ρ2xi2fi=h(xi)ρ2xi2\begin{aligned} h(x_{i}) & =f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}\\ \Leftrightarrow & f_{i}=h(x_{i})-\frac{\rho}{2}\|x_{i}\|^{2}\end{aligned}

​​and

h(xi)=f(xi)+ρxi=gi+ρxigi=h(xi)ρxi.\begin{aligned} \nabla h(x_{i}) & =\nabla f(x_{i})+\rho x_{i}=g_{i}+\rho x_{i}\\ \Leftrightarrow & g_{i}=\nabla h(x_{i})-\rho x_{i}.\end{aligned}

​​ So, from the last two equations, we see that fh(ρ/2)2Fρ,ρ(Rd)f\coloneqq h-(\rho/2)\|\cdot\|^{2}\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​ interpolates {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​. So, {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​ is Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​-interpolable.

Interpolation of smooth convex functions due to Yoel Drori

Finally, we present the following interpolation result due to Yoel Drori from [4]. I showed a detailed proof of this result in the previous blog post: here.

Theorem 3: Interpolation of smooth convex functions.
Theorem

If {(xundefinedi,gundefinedi,fundefinedi)}iIRd×Rd×R\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​​​​​ is F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​​​​​-interpolable with L>0L>0​​​​​, then one interpolated function fundefinedF0,L(Rd)\widetilde{f}\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​​​​​ that interpolates {(xundefinedi,gundefinedi,fundefinedi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}​​​​​ is:

fundefined(y)=minαΔ[L2yiIαi(xundefinedi1Lgundefinedi)2+iIαi(fundefinedi12Lgundefinedi2)],\widetilde{f}(y)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2L}\|\widetilde{g}_{i}\|^{2}\right)\right],

​​​​​ where Δ={βRIβ0,iIβi=1}.\Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}.​​​​​

Main result

Now we are in a position to state our main result due to Yoel Drori and Ohad Shamir from [1, Theorem 7] followed by its proof.

Theorem 4: Properties of interpolated function for smooth nonconvex function class
Theorem

Let the set {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​​​ be Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​​-interpolable, where L>0L>0​​​ and II​​​ is a finite index set. Define:

iargminiI{fi12ρgi2}, i^{\star}\in\underset{i\in I}{\textrm{argmin}}\{f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\},

hence,

fi12ρgi2=miniI{fi12ρgi2}. f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}=\min_{i\in I}\{f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\}.

Then the following are equivalent.

(i) The set of triplets {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I} is Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})-interpolable.

(ii) There exists a fFρ,ρ(Rd)f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​​ that interpolates {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​​ and any global minimizer xx^\star​​​ of ff​​​ (i.e., xargminxRdf(x)x^\star \in \textrm{argmin}_{x \in \mathbf{R}^d} f(x)​​​) is characterized by

f(x)=f(xi1Lgi)fi12ρgi2.(3)f(x^\star) = f(x_{i^{\star}}-\frac{1}{L}g_{i^{\star}}) \leq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}.\quad\quad\quad(3)

​​​​

Proof to Theorem 4.

First, we prove (ii)(i)(ii)\Rightarrow(i)​. If (ii)(ii)​ holds, then obviously ff​ interpolates {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​, hence {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​ is Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​-interpolable by Definition 1.

Next, we prove (i)(ii)(i)\Rightarrow(ii)​​. We want construct an interpolated function fFρ,ρ(Rd)f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​ satisfying (ii)(ii)​​. We have {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​ is Fρ,ρ(Rd)\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​-interpolable, but due to Theorem 2, this is equivalent to saying {(xi,gi+ρxi,fi+ρ2xi2)}iI\{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I}​​ is F0,2ρ(Rd)\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})​​-interpolable. Now, in Theorem 3, setting

{(xundefinedi,gundefinedi,fundefinedi)}iI{(xi,gi+ρxi,fi+ρ2xi2)}iI, and L2ρ,\begin{aligned} \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I} & \coloneqq\{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I},\textrm{ and }\\ L & \coloneqq2\rho,\end{aligned}

we have the function fundefinedF0,2ρ(Rd)\widetilde{f}\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d})​​​ interpolating {(xundefinedi,gundefinedi,fundefinedi)}iI\{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}​​​ defined by:

fundefined(y)=minαΔ[L2yiIαi(xundefinedi1Lgundefinedi)2+iIαi(fundefinedi12Lgundefinedi2)]=minαΔ[ρyiIαi(xi12ρ(gi+ρxi))2+iIαi(fi+ρ2xi214ρgi+ρxi2)],\begin{aligned} \widetilde{f}(y) & =\min_{\alpha\in\Delta}\left[\frac{L}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2L}\|\widetilde{g}_{i}\|^{2}\right)\right]\\ & =\min_{\alpha\in\Delta}\left[\rho\left\Vert y-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{2\rho}(g_{i}+\rho x_{i})\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}-\frac{1}{4\rho}\|g_{i}+\rho x_{i}\|^{2}\right)\right],\end{aligned}

where Δ={βRIβ0,iIβi=1}.\Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}.​​

Now, due to the proof (ii)(i)(ii)\Rightarrow(i)​​​ of Theorem 2, the function ffundefined(ρ/2)2Fρ,ρ(Rd)f\coloneqq \widetilde{f}-(\rho/2)\|\cdot\|^{2}\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})​​​ will interpolate {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​​. Hence, we have:

f(x)=fundefined(x)(ρ/2)x2=minαΔ[ρxiIαi(xi12ρ(gi+ρxi))2+iIαi(fi+ρ2xi214ρgi+ρxi2)]ρ2x2=minαΔ[ρ2x2+ρxiIαi(xi12ρ(gi+ρxi))2+iIαi(fi+ρ2xi214ρgi+ρxi2)]=minαΔ[ρ2xiIαi(xi1ρgi)2ρ4iIαi(xi1ρgi)2+iIαi(fi12ρgi2+ρ4xi1ρgi2)],\begin{aligned} f(x) & =\widetilde{f}(x)-(\rho/2)\|x\|^{2}\\ & =\min_{\alpha\in\Delta}\left[\rho\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{2\rho}(g_{i}+\rho x_{i})\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}-\frac{1}{4\rho}\|g_{i}+\rho x_{i}\|^{2}\right)\right]-\frac{\rho}{2}\|x\|^{2}\\ & =\min_{\alpha\in\Delta}\left[-\frac{\rho}{2}\|x\|^{2}+\rho\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{2\rho}(g_{i}+\rho x_{i})\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}-\frac{1}{4\rho}\|g_{i}+\rho x_{i}\|^{2}\right)\right]\\ & =\min_{\alpha\in\Delta}\left[\frac{\rho}{2}\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{\rho}g_{i}\right)\right\Vert ^{2}-\frac{\rho}{4}\left\Vert \sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{\rho}g_{i}\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}\right)\right],\end{aligned}

where the last line follows from the following algebraic simplification (click to expand)

image-20211124085922308

Hence, we have the interpolated function ff satisfying:

f(x)=minαΔ[ρ2xiIαi(xi1ρgi)2ρ4iIαi(xi1ρgi)2+iIαi(fi12ρgi2+ρ4xi1ρgi2)](4)a)minαΔρ4iIαi(xi1ρgi)2+iIαi(fi12ρgi2+ρ4xi1ρgi2)b)minαΔρ4iIαixi1ρgi2+iIαi(fi12ρgi2+ρ4xi1ρgi2)=c)minαΔρ4iIαixi1ρgi2+iIαi(fi12ρgi2)+ρ4iIαixi1ρgi2=minαΔiIαi(fi12ρgi2)=d)miniI{fi12ρgi2}=fi12ρgi2\begin{aligned} & f(x)\\ = & \min_{\alpha\in\Delta}\Big[\frac{\rho}{2}\|x-\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}-\frac{\rho}{4}\|\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}\\ & +\sum_{i\in I}\alpha_{i}(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\Big]\quad\quad\quad(4)\\ \overset{a)}{\geq} & \min_{\alpha\in\Delta}-\frac{\rho}{4}\left\Vert \sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{\rho}g_{i}\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}\right)\\ \overset{b)}{\geq} & \min_{\alpha\in\Delta}-\frac{\rho}{4}\sum_{i\in I}\alpha_{i}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}\right)\\ \overset{c)}{=} & \min_{\alpha\in\Delta}\cancel{-\frac{\rho}{4}\sum_{i\in I}\alpha_{i}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\right)+\cancel{\frac{\rho}{4}\sum_{i\in I}\alpha_{i}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}}\\ = & \min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\right)\\ \overset{d)}{=} & \min_{i\in I}\{f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\}\\ = & f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2} \end{aligned}

where in a)a) we construct a smaller term by removing the non-negative term (ρ/2)xiIαi(xi(1/ρ)gi)2(\rho/2)\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-(1/\rho)g_{i}\right)\right\Vert ^{2}​​​ from the previous line, b)b)​​​ uses Jensen's inequality

iIλiyi2iIλiyi2:where λ0,iIλi=1,\|\sum_{i\in I}\lambda_{i}y_{i}\|^{2}\leq\sum_{i\in I}\lambda_{i}\|y_{i}\|^{2}:\textrm{where }\lambda\geq0,\sum_{i\in I}\lambda_{i}=1,

c)c)​​​ just expands and cancels, and d)d)​​​ uses the identity miniIai=minαΔiIαiai.\min_{i\in I}a_{i}=\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}a_{i}.​​​ So, we have shown that, for all xRdx\in\mathbf{R}^{d}​​​ we have

f(x)fi12ρgi2,(5)f(x)\geq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2},\quad\quad\quad(5)

​​​ i.e., fi(1/2ρ)gi2f_{i^{\star}}-(1/2\rho)\|g_{i^{\star}}\|^{2}​​​ is a global lower bound for the interpolated function ff​​​. So, if we can show that some point xx^{\star}​​​ satisfies

f(x)fi12ρgi2,f(x^{\star})\leq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2},

​​​ then xx^{\star}​​​ must be the global minimizer of ff​​​. Now, if we set αei\alpha\coloneqq e_{i^{\star}}​​​(ii^{\star}​​​-th unit vector with its ii^{\star}​​​-th component 1 and rest 00​​​) and xxi(1/ρ)gix\coloneqq x_{i^{\star}}-(1/\rho)g_{i^{\star}}​​​ in (4),(4),​​​ we have an upper bound of ff​​​, i.e.,

f(xi(1/ρ)gi)=minαΔ[ρ2xi(1/ρ)giiIαi(xi1ρgi)2ρ4iIαi(xi1ρgi)2+iIαi(fi12ρgi2+ρ4xi1ρgi2)][ρ2xi(1/ρ)giiIαi(xi1ρgi)2ρ4iIαi(xi1ρgi)2+iIαi(fi12ρgi2+ρ4xi1ρgi2)]α=ei=ρ2(xi(1/ρ)gi)(xi1ρgi)2ρ4(xi1ρgi)2+(fi12ρgi2+ρ4xi1ρgi2)=ρ4(xi1ρgi)2+(fi12ρgi2)+ρ4xi1ρgi2=fi12ρgi2(6).\begin{aligned} & f(x_{i^{\star}}-(1/\rho)g_{i^{\star}})\\ = & \min_{\alpha\in\Delta}\Big[\frac{\rho}{2}\|x_{i^{\star}}-(1/\rho)g_{i^{\star}}-\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}-\frac{\rho}{4}\|\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}+\sum_{i\in I}\alpha_{i}(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\Big]\\ \leq & \Big[\frac{\rho}{2}\|x_{i^{\star}}-(1/\rho)g_{i^{\star}}-\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}-\frac{\rho}{4}\|\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}+\sum_{i\in I}\alpha_{i}(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\Big]_{\alpha=e_{i^{\star}}}\\ = & \frac{\rho}{2}\|\cancel{(x_{i^{\star}}-(1/\rho)g_{i^{\star}})}-\cancel{(x_{i^{\star}}-\frac{1}{\rho}g_{i^{\star}})}\|^{2}-\frac{\rho}{4}\|(x_{i^{\star}}-\frac{1}{\rho}g_{i^{\star}})\|^{2}+(f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\\ = & \cancel{-\frac{\rho}{4}\|(x_{i^{\star}}-\frac{1}{\rho}g_{i^{\star}})\|^{2}}+(f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2})+\cancel{\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2}}\\ = & f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}\quad\quad\quad(6).\end{aligned}

So the lower bound on ff​ in (5)(5)​ is achieved by xi(1/ρ)gix_{i^{\star}}-(1/\rho)g_{i^{\star}}​. Hence, any global minimizer xx^\star​ of ff​ (i.e., xargminxRdf(x)x^\star \in \textrm{argmin}_{x \in \mathbf{R}^d} f(x)​) is characterized by

f(x)=f(xi1Lgi)fi12ρgi2,f(x^\star) = f(x_{i^{\star}}-\frac{1}{L}g_{i^{\star}}) \leq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2},

and this completes the proof​ ■

References

[1] Drori, Y., & Shamir, O. (2020, November). The complexity of finding stationary points with stochastic gradient descent. In International Conference on Machine Learning (pp. 2658-2667). PMLR. Link

[2] Adrien B Taylor, Julien M Hendrickx, and Fran¸cois Glineur. Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313, 2017. Link

[3] Taylor, Adrien B. Convex interpolation and performance estimation of first-order methods for convex optimization. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017. Link

[4] Drori, Yoel. The exact information-based complexity of smooth convex minimization. Journal of Complexity 39 (2017): 1-16. Link