Constructing an interpolated function for the class of smooth convex functions

Shuvomoy Das Gupta

November 22, 2021

In this blog, we study constructing interpolated function for LL​​-smooth convex function due to Yoel Drori from [3].

Notation and notions. All norms are 2-norm in this blog. A function is LL-smooth convex if and only

(x,yRd)f(y)f(x)+f(x)yx+12Lf(x)f(y)2.\left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}.

On the other hand, a function is μ\mu​-strongly convex if and only if

(x,yRd)f(y)f(x)+f(x)yx+μ2f(x)f(y)2,\left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{\mu}{2}\|f^{\prime}(x)-f^{\prime}(y)\|^{2},

​ where f()f^{\prime}(\cdot)​​ denotes a subgradient of ff​​ at ()(\cdot)​​. The set of all LL​​-smooth convex functions on Rd\mathbf{R}^{d}​​ is denoted by F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​​, and the set of all μ\mu​​-strongly convex functions is denoted by Fμ,(Rd)\mathcal{F}_{\mu,\infty}(\mathbf{R}^{d})​​​.


Table of contents


Sion's minimax theorem

We are going to use the famous result due to Maurice Sion.

Theorem 1: Sion's minimax theorem
Theorem

Let XX be a compact convex set (over which minimization will be performed) and YY be a convex set in Rd\mathbf{R}^{d} (over which supremum will be computed). Suppose g:X×YRg:X\times Y\to\mathbf{R} satisfies the following properties: (i) g(x,)g(x,\cdot) is upper-semicontinuous and quasi-concave on YY for all xXx\in X, and (ii) g(,y)g(\cdot,y) is lower-semicontinuous and quasi-convex on XX for all yYy\in Y. Then we have

supyYminxXg(x,y)=minxXsupyYg(x,y).\sup_{y\in Y}\min_{x\in X}g(x,y)=\min_{x\in X}\sup_{y\in Y}g(x,y).

Interpolable function

First, we start with the definition of an interpolable function.

Definition 1: Interpolable function.
Definition

Suppose we are given the set of triplets {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​ where II​ is a finite index set. Let F(Rd)\mathcal{F}(\mathbf{R}^d)​ be some class of functions. Then the set {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i} ) \}_{i\in I}​ is F(Rd)\mathcal{F}(\mathbf{R}^{d})​-interpolable if and only if there exists a function fF(Rd)f\in\mathcal{F}(\mathbf{R}^{d})​ such that for all iIi\in I​ we have fi=f(xi)f_{i}=f(x_{i})​ and gif(xi)g_{i} \in \partial f(x_{i})​.

Main result (due to Yoel Drori)

Next, we prove the following result due to Yoel Drori from [3].

Theorem 2: Interpolation of smooth convex functions.
Theorem

If {(xi,gi,fi)}iIRd×Rd×R\{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R}​​​​ is F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​​​​-interpolable, then one interpolated function fF0,L(Rd)f\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​​​​ that can be constructed from {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​​​ is:

f(x)=minαΔ[L2xiIαi(xi1Lgi)2+iIαi(fi12Lgi2)],f(x)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{L}g_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\right)\right],

​​​​ where Δ={αˉRIαˉ0,iIαˉi=1}.\Delta=\{\bar{\alpha}\in\mathbf{R}^{\vert I\vert}\mid\bar{\alpha}\geq0,\sum_{i\in I}\bar{\alpha}_{i}=1\}.​​​​​


Proof sketch. We use the following chain of logic:

(a) {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I} interpolated by fF0,L(Rd)f\in\mathcal{F}_{0,L}(\mathbf{R}^{d})

\Leftrightarrow

(b) {(xˉi,gˉi,fˉi)}iI{(gi,xi,xigifi)}iI\{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}\coloneqq\{(g_{i},x_{i},\left\langle x_{i}\mid g_{i}\right\rangle -f_{i})\}_{i\in I}​ interpolated by h=fF1/L,h=f^{*}\in\mathcal{F}_{1/L,\infty}​, where ff^* denotes conjugate function of ff.

So, we start backwards from (b). First construct hh that interpolates {(xˉi,gˉi,fˉi)}iI\{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}. Then construct f=h,f=h^{*}, which will be interpolate {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​.


Now we start the proof.

Proof

The set {(xi,gi,fi)}iI\{(x_{i},g_{i},f_{i})\}_{i\in I}​​​​ is F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​​​​-interpolable if and only if {(xˉi,gˉi,fˉi)}iI{(gi,xi,xigifi)}iI\{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}\coloneqq\{(g_{i},x_{i},\left\langle x_{i}\mid g_{i}\right\rangle -f_{i})\}_{i\in I}​​​​ is F1/L,(Rd)\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d})-interpolable​​​​, which is from [1, Lemma 3.7]. Also, if {(xˉi,gˉi,fˉi)}iI\{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}​​​​ is Fμ,(Rd)\mathcal{F}_{\mu,\infty}(\mathbf{R}^{d})​​​​-interpolable then one such interpolated function hFμ,(Rd)h\in\mathcal{F}_{\mu,\infty}(\mathbf{R}^{d})​​​​ would be:

h(xˉ)=maxiI{fˉi+gˉixˉxˉi+μ2xˉxˉi2}\begin{aligned} h(\bar{x}) & =\max_{i\in I}\{\bar{f}_{i}+\left\langle \bar{g}_{i}\mid\bar{x}-\bar{x}_{i}\right\rangle +\frac{\mu}{2}\|\bar{x}-\bar{x}_{i}\|^{2}\}\end{aligned}

​​​​ where we have used the fact that maxiIai=maxαΔiIαiai.\max_{i\in I}a_{i}=\max_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}a_{i}.​​​​

So, if {(xˉi,gˉi,fˉi)}iI{(gi,xi,xigifi)}iI\{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}\coloneqq\{(g_{i},x_{i},\left\langle x_{i}\mid g_{i}\right\rangle -f_{i})\}_{i\in I}​ is F1/L,(Rd)\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d})​-interpolable, then one such interpolated function hF1/L,(Rd)h\in\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d})​ would be

h(xˉ)=maxiI{fˉi+gˉixˉxˉi+12Lxˉxˉi2}=maxiI{xigifi+xixˉgi+12Lxˉgi2}=maxiI{xigifi+xixˉxigi+12Lxˉgi2}=maxiI{fi+xixˉ+12Lxˉgi2}=miniI{fixixˉ12Lxˉgi2},(1)\begin{aligned} h(\bar{x}) & =\max_{i\in I}\left\{ \bar{f}_{i}+\left\langle \bar{g}_{i}\mid\bar{x}-\bar{x}_{i}\right\rangle +\frac{1}{2L}\|\bar{x}-\bar{x}_{i}\|^{2}\right\} \\ & =\max_{i\in I}\left\{ \left\langle x_{i}\mid g_{i}\right\rangle -f_{i}+\left\langle x_{i}\mid\bar{x}-g_{i}\right\rangle +\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} \\ & =\max_{i\in I}\left\{ \cancel{\left\langle x_{i}\mid g_{i}\right\rangle }-f_{i}+\left\langle x_{i}\mid\bar{x}\right\rangle -\cancel{\left\langle x_{i}\mid g_{i}\right\rangle }+\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} \\ & =\max_{i\in I}\left\{ -f_{i}+\left\langle x_{i}\mid\bar{x}\right\rangle +\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} \\ & =-\min_{i\in I}\left\{ f_{i}-\left\langle x_{i}\mid\bar{x}\right\rangle -\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} ,\quad\quad\quad(1)\end{aligned}

​ where in the last line we have used max()=min().\max(\cdot)=-\min(-\cdot).

But, we are looking for an interpolated function that is in F0,L(Rd),\mathcal{F}_{0,L}(\mathbf{R}^{d}), not F1/L,(Rd)\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}). How to go from F1/L,(Rd)\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) to F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})? To that goal, we use the following results:

(i) fF0,L(Rd)f\in\mathcal{F}_{0,L}(\mathbf{R}^{d})​ if and only if ff​'s conjugate function f()=supyRd[f(y)+y]F1/L,(Rd)f^{*}(\cdot)=\sup_{y\in\mathbf{R}^{d}}\left[-f(y)+\left\langle \cdot\mid y\right\rangle \right]\in\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d})[1, Theorem 2.34].

(ii) for any lower-semicontinuous, proper, and convex function ff, we have f=ff=f^{**} [2, Theorem 12.2].

Due to (i) and (ii), to find the interpolated function in F0,L(Rd),\mathcal{F}_{0,L}(\mathbf{R}^{d}),​​ all we have to do is to compute the conjugate function of hh​​. In other words, the desired interpolated function in F0,L(Rd)\mathcal{F}_{0,L}(\mathbf{R}^{d})​​ would be

f(x)=h(x)=supyRd[h(y)+xy]=(1)supyRd[[miniI{fixiy12Lygi2}]+xy]=supyRd[miniI{fixiy12Lygi2+xy}]=supyRd[miniI{fi12Lygi2+xxiy}]=supyRd[minαΔiIαi{fi12Lygi2+xxiy}],(2) \begin{aligned} f(x) & =h^{*}(x)\\ & =\sup_{y\in\mathbf{R}^{d}}\left[-h(y)+\left\langle x\mid y\right\rangle \right]\\ & \overset{(1)}{=}\sup_{y\in\mathbf{R}^{d}}\left[-\left[-\min_{i\in I}\left\{ f_{i}-\left\langle x_{i}\mid y\right\rangle -\frac{1}{2L}\|y-g_{i}\|^{2}\right\} \right]+\left\langle x\mid y\right\rangle \right]\\ & =\sup_{y\in\mathbf{R}^{d}}\left[\min_{i\in I}\left\{ f_{i}-\left\langle x_{i}\mid y\right\rangle -\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x\mid y\right\rangle \right\} \right]\\ & =\sup_{y\in\mathbf{R}^{d}}\left[\min_{i\in I}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right]\\ & =\sup_{y\in\mathbf{R}^{d}}\left[\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right],\quad\quad\quad(2)\end{aligned}

where in the last line we have used the fact that miniIai=minαΔiIαiai\min_{i\in I}a_{i}=\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}a_{i}. Now, in (2), denote the inner function by

p(y,α)=iIαi{fi12Lygi2+xxiy},p(y,\alpha)=\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} ,

which is continuous and concave with respect to the maximizing variable yy and continuous and convex (in fact linear) with respect to the minimizing variable α\alpha. Finally, Δ\Delta–-the minimizing set–-is compact and convex, and Rd\mathbf{R}^{d}​–-​​the maximizing set–-is convex. Hence, applying Sion's minimax theorem we have:

f(x)=supyRd[minαΔiIαi{fi12Lygi2+xxiy}]=minαΔ[supyRdiIαi{fi12Lygi2+xxiy}].(3)\begin{aligned} & f(x)\\ = & \sup_{y\in\mathbf{R}^{d}}\left[\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right]\\ = & \min_{\alpha\in\Delta}\left[\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right].\quad\quad\quad(3)\end{aligned}

Let us focus on the inner maximization problem, which is a convex optimization problem. So, by taking derivative with respect to yy and then setting that equal to zero, we can compute the maximizer as follows:

y[iIαi{fi12Lygi2+xxiy}]=0iIαi{121L×2×(ygi)+xxi}=iIαi(1Ly+1Lgi+xxi)=0iIαi(1Lgi+xxi)=iIαi(1Ly)=1LyiIαiundefined=1=1LyiIαi(1Lgi+xxi)=1Lyy=L(iIαi{(xxi)+1Lgi})y=L(xiIαi(xi1Lgi))(4) \begin{aligned} & \nabla_{y}\left[\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right]=0\\ \Leftrightarrow & \sum_{i\in I}\alpha_{i}\left\{ -\frac{1}{\cancel{2}}\frac{1}{L}\times\cancel{2}\times(y-g_{i})+x-x_{i}\right\} =\sum_{i\in I}\alpha_{i}\left(-\frac{1}{L}y+\frac{1}{L}g_{i}+x-x_{i}\right)=0\\ \Leftrightarrow & \sum_{i\in I}\alpha_{i}\left(\frac{1}{L}g_{i}+x-x_{i}\right)=\sum_{i\in I}\alpha_{i}\left(\frac{1}{L}y\right)=\frac{1}{L}y\underbrace{\sum_{i\in I}\alpha_{i}}_{=1}=\frac{1}{L}y\\ \Leftrightarrow & \sum_{i\in I}\alpha_{i}\left(\frac{1}{L}g_{i}+x-x_{i}\right)=\frac{1}{L}y\\ \Leftrightarrow & y^{\star}=L\Big(\sum_{i\in I}\alpha_{i}\{(x-x_{i})+\frac{1}{L}g_{i}\}\Big)\\ \therefore y^{\star} & =L\Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\quad\quad\quad(4) \end{aligned}

Putting the value of yy​​​​​ in (4) in the inner maximization problem, we have:

supyRdiIαi(fi12Lygi2undefined=y2+gi22ygi+xxiy)=supyRdiIαi(fi12L{y2+gi22ygi}+xxiy)=iIαi(fi12Lgi2)+supyRdiIαi(12Ly2+y1Lgi+xxiy)=iIαi(fi12Lgi2)+supyRdiIαi(12Ly2+xxi+1Lgiy)=iIαi(fi12Lgi2)+iIαi(12Ly2+xxi+1Lgiy)=iIαi(fi12Lgi2)12Ly2+iIαixxi+1Lgiy=iIαi(fi12Lgi2)12LL(xiIαi(xi1Lgi))2+(xiIαi(xi1Lgi))L(xiIαi(xi1Lgi))=iIαi(fi12Lgi2)L2xiIαi(xi1Lgi))2+LxiIαi(xi1Lgi))2=iIαi(fi12Lgi2)+L2xiIαi(xi1Lgi))2=L2xiIαi(xi1Lgi)2+iIαi(fi12Lgi2).(5) \begin{aligned} & \sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\underbrace{\|y-g_{i}\|^{2}}_{=\|y\|^{2}+\|g_{i}\|^{2}-2\langle y\mid g_{i}\rangle}+\left\langle x-x_{i}\mid y\right\rangle \Big)\\ & =\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\{\|y\|^{2}+\|g_{i}\|^{2}-2\langle y\mid g_{i}\rangle\}+\left\langle x-x_{i}\mid y\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(-\frac{1}{2L}\|y^{\star}\|^{2}+\langle y^{\star}\mid\frac{1}{L}g_{i}\rangle+\left\langle x-x_{i}\mid y^{\star}\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(-\frac{1}{2L}\|y^{\star}\|^{2}+\left\langle x-x_{i}+\frac{1}{L}g_{i}\mid y^{\star}\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\sum_{i\in I}\alpha_{i}\Big(-\frac{1}{2L}\|y^{\star}\|^{2}+\left\langle x-x_{i}+\frac{1}{L}g_{i}\mid y^{\star}\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)-\frac{1}{2L}\|y^{\star}\|^{2}+\sum_{i\in I}\alpha_{i}\left\langle x-x_{i}+\frac{1}{L}g_{i}\mid y^{\star}\right\rangle \\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)-\frac{1}{2L}\|L\Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}\\ & +\left\langle \Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\mid L\Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\right\rangle \\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)-\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}+L\|x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}\\ & =\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{L}g_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\right).\quad\quad\quad(5) \end{aligned}

Using (5) in (3), we arrive at the desired interpolated function fF0,L(Rd):f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}):

f(x)=minαΔ[L2xiIαi(xi1Lgi)2+iIαi(fi12Lgi2)],f(x)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{L}g_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\right)\right],

​​​​​​ thus completing the proof.

References

[1] Taylor, Adrien B. Convex interpolation and performance estimation of first-order methods for convex optimization. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017.

[2] Rockafellar, Tyrell. Convex Analysis. Princeton University Press, 1996.

[3] Drori, Yoel. The exact information-based complexity of smooth convex minimization. Journal of Complexity 39 (2017): 1-16.