f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 1 2 L ∥ ∇ f ( x ) − ∇ f ( y ) ∥ 2 . \left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}. ( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 L 1 ∥∇ f ( x ) − ∇ f ( y ) ∥ 2 . On the other hand, a function is μ \mu μ -strongly convex if and only if
( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + μ 2 ∥ f ′ ( x ) − f ′ ( y ) ∥ 2 , \left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{\mu}{2}\|f^{\prime}(x)-f^{\prime}(y)\|^{2}, ( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 μ ∥ f ′ ( x ) − f ′ ( y ) ∥ 2 , where f ′ ( ⋅ ) f^{\prime}(\cdot) f ′ ( ⋅ ) denotes a subgradient of f f f at ( ⋅ ) (\cdot) ( ⋅ ) . The set of all L L L -smooth convex functions on R d \mathbf{R}^{d} R d is denoted by F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) , and the set of all μ \mu μ -strongly convex functions is denoted by F μ , ∞ ( R d ) \mathcal{F}_{\mu,\infty}(\mathbf{R}^{d}) F μ , ∞ ( R d ) .
Table of contents
We are going to use the famous result due to Maurice Sion .
Let
X X X be a compact convex set (over which minimization will be performed) and
Y Y Y be a convex set in
R d \mathbf{R}^{d} R d (over which supremum will be computed). Suppose
g : X × Y → R g:X\times Y\to\mathbf{R} g : X × Y → R satisfies the following properties: (i)
g ( x , ⋅ ) g(x,\cdot) g ( x , ⋅ ) is upper-semicontinuous and quasi-concave on
Y Y Y for all
x ∈ X x\in X x ∈ X , and (ii)
g ( ⋅ , y ) g(\cdot,y) g ( ⋅ , y ) is lower-semicontinuous and quasi-convex on
X X X for all
y ∈ Y y\in Y y ∈ Y . Then we have
sup y ∈ Y min x ∈ X g ( x , y ) = min x ∈ X sup y ∈ Y g ( x , y ) . \sup_{y\in Y}\min_{x\in X}g(x,y)=\min_{x\in X}\sup_{y\in Y}g(x,y). y ∈ Y sup x ∈ X min g ( x , y ) = x ∈ X min y ∈ Y sup g ( x , y ) .
First, we start with the definition of an interpolable function.
Suppose we are given the set of triplets
{ ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R where
I I I is a finite index set. Let
F ( R d ) \mathcal{F}(\mathbf{R}^d) F ( R d ) be some class of functions. Then the set
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i} ) \}_{i\in I} {( x i , g i , f i ) } i ∈ I is
F ( R d ) \mathcal{F}(\mathbf{R}^{d}) F ( R d ) -interpolable if and only if there exists a function
f ∈ F ( R d ) f\in\mathcal{F}(\mathbf{R}^{d}) f ∈ F ( R d ) such that for all
i ∈ I i\in I i ∈ I we have
f i = f ( x i ) f_{i}=f(x_{i}) f i = f ( x i ) and
g i ∈ ∂ f ( x i ) g_{i} \in \partial f(x_{i}) g i ∈ ∂ f ( x i ) .
Next, we prove the following result due to Yoel Drori from [3]
.
If
{ ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R is
F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) -interpolable, then one interpolated function
f ∈ F 0 , L ( R d ) f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) that can be constructed from
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is:
f ( x ) = min α ∈ Δ [ L 2 ∥ x − ∑ i ∈ I α i ( x i − 1 L g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) ] , f(x)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{L}g_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\right)\right], f ( x ) = α ∈ Δ min [ 2 L ∥ x − i ∈ I ∑ α i ( x i − L 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) ] , where Δ = { α ˉ ∈ R ∣ I ∣ ∣ α ˉ ≥ 0 , ∑ i ∈ I α ˉ i = 1 } . \Delta=\{\bar{\alpha}\in\mathbf{R}^{\vert I\vert}\mid\bar{\alpha}\geq0,\sum_{i\in I}\bar{\alpha}_{i}=1\}. Δ = { α ˉ ∈ R ∣ I ∣ ∣ α ˉ ≥ 0 , ∑ i ∈ I α ˉ i = 1 } .
Proof sketch. We use the following chain of logic:
(a) { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I interpolated by f ∈ F 0 , L ( R d ) f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d )
⇔ \Leftrightarrow ⇔ (b) { ( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I ≔ { ( g i , x i , ⟨ x i ∣ g i ⟩ − f i ) } i ∈ I \{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}\coloneqq\{(g_{i},x_{i},\left\langle x_{i}\mid g_{i}\right\rangle -f_{i})\}_{i\in I} {( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I : = {( g i , x i , ⟨ x i ∣ g i ⟩ − f i ) } i ∈ I interpolated by h = f ∗ ∈ F 1 / L , ∞ h=f^{*}\in\mathcal{F}_{1/L,\infty} h = f ∗ ∈ F 1/ L , ∞ , where f ∗ f^* f ∗ denotes conjugate function of f f f .
So, we start backwards from (b). First construct h h h that interpolates { ( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I \{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I} {( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I . Then construct f = h ∗ , f=h^{*}, f = h ∗ , which will be interpolate { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I .
Now we start the proof.
The set { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) -interpolable if and only if { ( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I ≔ { ( g i , x i , ⟨ x i ∣ g i ⟩ − f i ) } i ∈ I \{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}\coloneqq\{(g_{i},x_{i},\left\langle x_{i}\mid g_{i}\right\rangle -f_{i})\}_{i\in I} {( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I : = {( g i , x i , ⟨ x i ∣ g i ⟩ − f i ) } i ∈ I is F 1 / L , ∞ ( R d ) \mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) F 1/ L , ∞ ( R d ) -interpolable, which is from [1, Lemma 3.7]
. Also, if { ( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I \{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I} {( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I is F μ , ∞ ( R d ) \mathcal{F}_{\mu,\infty}(\mathbf{R}^{d}) F μ , ∞ ( R d ) -interpolable then one such interpolated function h ∈ F μ , ∞ ( R d ) h\in\mathcal{F}_{\mu,\infty}(\mathbf{R}^{d}) h ∈ F μ , ∞ ( R d ) would be:
h ( x ˉ ) = max i ∈ I { f ˉ i + ⟨ g ˉ i ∣ x ˉ − x ˉ i ⟩ + μ 2 ∥ x ˉ − x ˉ i ∥ 2 } \begin{aligned} h(\bar{x}) & =\max_{i\in I}\{\bar{f}_{i}+\left\langle \bar{g}_{i}\mid\bar{x}-\bar{x}_{i}\right\rangle +\frac{\mu}{2}\|\bar{x}-\bar{x}_{i}\|^{2}\}\end{aligned} h ( x ˉ ) = i ∈ I max { f ˉ i + ⟨ g ˉ i ∣ x ˉ − x ˉ i ⟩ + 2 μ ∥ x ˉ − x ˉ i ∥ 2 } where we have used the fact that max i ∈ I a i = max α ∈ Δ ∑ i ∈ I α i a i . \max_{i\in I}a_{i}=\max_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}a_{i}. max i ∈ I a i = max α ∈ Δ ∑ i ∈ I α i a i .
So, if { ( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I ≔ { ( g i , x i , ⟨ x i ∣ g i ⟩ − f i ) } i ∈ I \{(\bar{x}_{i},\bar{g}_{i},\bar{f}_{i})\}_{i\in I}\coloneqq\{(g_{i},x_{i},\left\langle x_{i}\mid g_{i}\right\rangle -f_{i})\}_{i\in I} {( x ˉ i , g ˉ i , f ˉ i ) } i ∈ I : = {( g i , x i , ⟨ x i ∣ g i ⟩ − f i ) } i ∈ I is F 1 / L , ∞ ( R d ) \mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) F 1/ L , ∞ ( R d ) -interpolable, then one such interpolated function h ∈ F 1 / L , ∞ ( R d ) h\in\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) h ∈ F 1/ L , ∞ ( R d ) would be
h ( x ˉ ) = max i ∈ I { f ˉ i + ⟨ g ˉ i ∣ x ˉ − x ˉ i ⟩ + 1 2 L ∥ x ˉ − x ˉ i ∥ 2 } = max i ∈ I { ⟨ x i ∣ g i ⟩ − f i + ⟨ x i ∣ x ˉ − g i ⟩ + 1 2 L ∥ x ˉ − g i ∥ 2 } = max i ∈ I { ⟨ x i ∣ g i ⟩ − f i + ⟨ x i ∣ x ˉ ⟩ − ⟨ x i ∣ g i ⟩ + 1 2 L ∥ x ˉ − g i ∥ 2 } = max i ∈ I { − f i + ⟨ x i ∣ x ˉ ⟩ + 1 2 L ∥ x ˉ − g i ∥ 2 } = − min i ∈ I { f i − ⟨ x i ∣ x ˉ ⟩ − 1 2 L ∥ x ˉ − g i ∥ 2 } , ( 1 ) \begin{aligned} h(\bar{x}) & =\max_{i\in I}\left\{ \bar{f}_{i}+\left\langle \bar{g}_{i}\mid\bar{x}-\bar{x}_{i}\right\rangle +\frac{1}{2L}\|\bar{x}-\bar{x}_{i}\|^{2}\right\} \\ & =\max_{i\in I}\left\{ \left\langle x_{i}\mid g_{i}\right\rangle -f_{i}+\left\langle x_{i}\mid\bar{x}-g_{i}\right\rangle +\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} \\ & =\max_{i\in I}\left\{ \cancel{\left\langle x_{i}\mid g_{i}\right\rangle }-f_{i}+\left\langle x_{i}\mid\bar{x}\right\rangle -\cancel{\left\langle x_{i}\mid g_{i}\right\rangle }+\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} \\ & =\max_{i\in I}\left\{ -f_{i}+\left\langle x_{i}\mid\bar{x}\right\rangle +\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} \\ & =-\min_{i\in I}\left\{ f_{i}-\left\langle x_{i}\mid\bar{x}\right\rangle -\frac{1}{2L}\|\bar{x}-g_{i}\|^{2}\right\} ,\quad\quad\quad(1)\end{aligned} h ( x ˉ ) = i ∈ I max { f ˉ i + ⟨ g ˉ i ∣ x ˉ − x ˉ i ⟩ + 2 L 1 ∥ x ˉ − x ˉ i ∥ 2 } = i ∈ I max { ⟨ x i ∣ g i ⟩ − f i + ⟨ x i ∣ x ˉ − g i ⟩ + 2 L 1 ∥ x ˉ − g i ∥ 2 } = i ∈ I max { ⟨ x i ∣ g i ⟩ − f i + ⟨ x i ∣ x ˉ ⟩ − ⟨ x i ∣ g i ⟩ + 2 L 1 ∥ x ˉ − g i ∥ 2 } = i ∈ I max { − f i + ⟨ x i ∣ x ˉ ⟩ + 2 L 1 ∥ x ˉ − g i ∥ 2 } = − i ∈ I min { f i − ⟨ x i ∣ x ˉ ⟩ − 2 L 1 ∥ x ˉ − g i ∥ 2 } , ( 1 ) where in the last line we have used max ( ⋅ ) = − min ( − ⋅ ) . \max(\cdot)=-\min(-\cdot). max ( ⋅ ) = − min ( − ⋅ ) .
But, we are looking for an interpolated function that is in F 0 , L ( R d ) , \mathcal{F}_{0,L}(\mathbf{R}^{d}), F 0 , L ( R d ) , not F 1 / L , ∞ ( R d ) \mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) F 1/ L , ∞ ( R d ) . How to go from F 1 / L , ∞ ( R d ) \mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) F 1/ L , ∞ ( R d ) to F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) ? To that goal, we use the following results:
(i) f ∈ F 0 , L ( R d ) f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) if and only if f f f 's conjugate function f ∗ ( ⋅ ) = sup y ∈ R d [ − f ( y ) + ⟨ ⋅ ∣ y ⟩ ] ∈ F 1 / L , ∞ ( R d ) f^{*}(\cdot)=\sup_{y\in\mathbf{R}^{d}}\left[-f(y)+\left\langle \cdot\mid y\right\rangle \right]\in\mathcal{F}_{1/L,\infty}(\mathbf{R}^{d}) f ∗ ( ⋅ ) = sup y ∈ R d [ − f ( y ) + ⟨ ⋅ ∣ y ⟩ ] ∈ F 1/ L , ∞ ( R d ) [1, Theorem 2.34]
.
(ii) for any lower-semicontinuous, proper, and convex function f f f , we have f = f ∗ ∗ f=f^{**} f = f ∗∗ [2, Theorem 12.2]
.
Due to (i) and (ii), to find the interpolated function in F 0 , L ( R d ) , \mathcal{F}_{0,L}(\mathbf{R}^{d}), F 0 , L ( R d ) , all we have to do is to compute the conjugate function of h h h . In other words, the desired interpolated function in F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) would be
f ( x ) = h ∗ ( x ) = sup y ∈ R d [ − h ( y ) + ⟨ x ∣ y ⟩ ] = ( 1 ) sup y ∈ R d [ − [ − min i ∈ I { f i − ⟨ x i ∣ y ⟩ − 1 2 L ∥ y − g i ∥ 2 } ] + ⟨ x ∣ y ⟩ ] = sup y ∈ R d [ min i ∈ I { f i − ⟨ x i ∣ y ⟩ − 1 2 L ∥ y − g i ∥ 2 + ⟨ x ∣ y ⟩ } ] = sup y ∈ R d [ min i ∈ I { f i − 1 2 L ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] = sup y ∈ R d [ min α ∈ Δ ∑ i ∈ I α i { f i − 1 2 L ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] , ( 2 ) \begin{aligned} f(x) & =h^{*}(x)\\ & =\sup_{y\in\mathbf{R}^{d}}\left[-h(y)+\left\langle x\mid y\right\rangle \right]\\ & \overset{(1)}{=}\sup_{y\in\mathbf{R}^{d}}\left[-\left[-\min_{i\in I}\left\{ f_{i}-\left\langle x_{i}\mid y\right\rangle -\frac{1}{2L}\|y-g_{i}\|^{2}\right\} \right]+\left\langle x\mid y\right\rangle \right]\\ & =\sup_{y\in\mathbf{R}^{d}}\left[\min_{i\in I}\left\{ f_{i}-\left\langle x_{i}\mid y\right\rangle -\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x\mid y\right\rangle \right\} \right]\\ & =\sup_{y\in\mathbf{R}^{d}}\left[\min_{i\in I}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right]\\ & =\sup_{y\in\mathbf{R}^{d}}\left[\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right],\quad\quad\quad(2)\end{aligned} f ( x ) = h ∗ ( x ) = y ∈ R d sup [ − h ( y ) + ⟨ x ∣ y ⟩ ] = ( 1 ) y ∈ R d sup [ − [ − i ∈ I min { f i − ⟨ x i ∣ y ⟩ − 2 L 1 ∥ y − g i ∥ 2 } ] + ⟨ x ∣ y ⟩ ] = y ∈ R d sup [ i ∈ I min { f i − ⟨ x i ∣ y ⟩ − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x ∣ y ⟩ } ] = y ∈ R d sup [ i ∈ I min { f i − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] = y ∈ R d sup [ α ∈ Δ min i ∈ I ∑ α i { f i − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] , ( 2 ) where in the last line we have used the fact that min i ∈ I a i = min α ∈ Δ ∑ i ∈ I α i a i \min_{i\in I}a_{i}=\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}a_{i} min i ∈ I a i = min α ∈ Δ ∑ i ∈ I α i a i . Now, in (2), denote the inner function by
p ( y , α ) = ∑ i ∈ I α i { f i − 1 2 L ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } , p(y,\alpha)=\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} , p ( y , α ) = i ∈ I ∑ α i { f i − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } , which is continuous and concave with respect to the maximizing variable y y y and continuous and convex (in fact linear) with respect to the minimizing variable α \alpha α . Finally, Δ \Delta Δ –-the minimizing set–-is compact and convex, and R d \mathbf{R}^{d} R d –-the maximizing set–-is convex. Hence, applying Sion's minimax theorem we have:
f ( x ) = sup y ∈ R d [ min α ∈ Δ ∑ i ∈ I α i { f i − 1 2 L ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] = min α ∈ Δ [ sup y ∈ R d ∑ i ∈ I α i { f i − 1 2 L ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] . ( 3 ) \begin{aligned} & f(x)\\ = & \sup_{y\in\mathbf{R}^{d}}\left[\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right]\\ = & \min_{\alpha\in\Delta}\left[\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right].\quad\quad\quad(3)\end{aligned} = = f ( x ) y ∈ R d sup [ α ∈ Δ min i ∈ I ∑ α i { f i − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] α ∈ Δ min [ y ∈ R d sup i ∈ I ∑ α i { f i − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] . ( 3 ) Let us focus on the inner maximization problem, which is a convex optimization problem. So, by taking derivative with respect to y y y and then setting that equal to zero, we can compute the maximizer as follows:
∇ y [ ∑ i ∈ I α i { f i − 1 2 L ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] = 0 ⇔ ∑ i ∈ I α i { − 1 2 1 L × 2 × ( y − g i ) + x − x i } = ∑ i ∈ I α i ( − 1 L y + 1 L g i + x − x i ) = 0 ⇔ ∑ i ∈ I α i ( 1 L g i + x − x i ) = ∑ i ∈ I α i ( 1 L y ) = 1 L y ∑ i ∈ I α i undefined = 1 = 1 L y ⇔ ∑ i ∈ I α i ( 1 L g i + x − x i ) = 1 L y ⇔ y ⋆ = L ( ∑ i ∈ I α i { ( x − x i ) + 1 L g i } ) ∴ y ⋆ = L ( x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ( 4 ) \begin{aligned} & \nabla_{y}\left[\sum_{i\in I}\alpha_{i}\left\{ f_{i}-\frac{1}{2L}\|y-g_{i}\|^{2}+\left\langle x-x_{i}\mid y\right\rangle \right\} \right]=0\\ \Leftrightarrow & \sum_{i\in I}\alpha_{i}\left\{ -\frac{1}{\cancel{2}}\frac{1}{L}\times\cancel{2}\times(y-g_{i})+x-x_{i}\right\} =\sum_{i\in I}\alpha_{i}\left(-\frac{1}{L}y+\frac{1}{L}g_{i}+x-x_{i}\right)=0\\ \Leftrightarrow & \sum_{i\in I}\alpha_{i}\left(\frac{1}{L}g_{i}+x-x_{i}\right)=\sum_{i\in I}\alpha_{i}\left(\frac{1}{L}y\right)=\frac{1}{L}y\underbrace{\sum_{i\in I}\alpha_{i}}_{=1}=\frac{1}{L}y\\ \Leftrightarrow & \sum_{i\in I}\alpha_{i}\left(\frac{1}{L}g_{i}+x-x_{i}\right)=\frac{1}{L}y\\ \Leftrightarrow & y^{\star}=L\Big(\sum_{i\in I}\alpha_{i}\{(x-x_{i})+\frac{1}{L}g_{i}\}\Big)\\ \therefore y^{\star} & =L\Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\quad\quad\quad(4) \end{aligned} ⇔ ⇔ ⇔ ⇔ ∴ y ⋆ ∇ y [ i ∈ I ∑ α i { f i − 2 L 1 ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ } ] = 0 i ∈ I ∑ α i { − 2 1 L 1 × 2 × ( y − g i ) + x − x i } = i ∈ I ∑ α i ( − L 1 y + L 1 g i + x − x i ) = 0 i ∈ I ∑ α i ( L 1 g i + x − x i ) = i ∈ I ∑ α i ( L 1 y ) = L 1 y = 1 i ∈ I ∑ α i = L 1 y i ∈ I ∑ α i ( L 1 g i + x − x i ) = L 1 y y ⋆ = L ( i ∈ I ∑ α i {( x − x i ) + L 1 g i } ) = L ( x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ( 4 ) Putting the value of y y y in (4) in the inner maximization problem, we have:
sup y ∈ R d ∑ i ∈ I α i ( f i − 1 2 L ∥ y − g i ∥ 2 undefined = ∥ y ∥ 2 + ∥ g i ∥ 2 − 2 ⟨ y ∣ g i ⟩ + ⟨ x − x i ∣ y ⟩ ) = sup y ∈ R d ∑ i ∈ I α i ( f i − 1 2 L { ∥ y ∥ 2 + ∥ g i ∥ 2 − 2 ⟨ y ∣ g i ⟩ } + ⟨ x − x i ∣ y ⟩ ) = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) + sup y ∈ R d ∑ i ∈ I α i ( − 1 2 L ∥ y ⋆ ∥ 2 + ⟨ y ⋆ ∣ 1 L g i ⟩ + ⟨ x − x i ∣ y ⋆ ⟩ ) = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) + sup y ∈ R d ∑ i ∈ I α i ( − 1 2 L ∥ y ⋆ ∥ 2 + ⟨ x − x i + 1 L g i ∣ y ⋆ ⟩ ) = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) + ∑ i ∈ I α i ( − 1 2 L ∥ y ⋆ ∥ 2 + ⟨ x − x i + 1 L g i ∣ y ⋆ ⟩ ) = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) − 1 2 L ∥ y ⋆ ∥ 2 + ∑ i ∈ I α i ⟨ x − x i + 1 L g i ∣ y ⋆ ⟩ = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) − 1 2 L ∥ L ( x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ∥ 2 + ⟨ ( x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ∣ L ( x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ⟩ = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) − L 2 ∥ x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ∥ 2 + L ∥ x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ∥ 2 = ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) + L 2 ∥ x − ∑ i ∈ I α i ( x i − 1 L g i ) ) ∥ 2 = L 2 ∥ x − ∑ i ∈ I α i ( x i − 1 L g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) . ( 5 ) \begin{aligned} & \sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\underbrace{\|y-g_{i}\|^{2}}_{=\|y\|^{2}+\|g_{i}\|^{2}-2\langle y\mid g_{i}\rangle}+\left\langle x-x_{i}\mid y\right\rangle \Big)\\ & =\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\{\|y\|^{2}+\|g_{i}\|^{2}-2\langle y\mid g_{i}\rangle\}+\left\langle x-x_{i}\mid y\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(-\frac{1}{2L}\|y^{\star}\|^{2}+\langle y^{\star}\mid\frac{1}{L}g_{i}\rangle+\left\langle x-x_{i}\mid y^{\star}\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\sup_{y\in\mathbf{R}^{d}}\sum_{i\in I}\alpha_{i}\Big(-\frac{1}{2L}\|y^{\star}\|^{2}+\left\langle x-x_{i}+\frac{1}{L}g_{i}\mid y^{\star}\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\sum_{i\in I}\alpha_{i}\Big(-\frac{1}{2L}\|y^{\star}\|^{2}+\left\langle x-x_{i}+\frac{1}{L}g_{i}\mid y^{\star}\right\rangle \Big)\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)-\frac{1}{2L}\|y^{\star}\|^{2}+\sum_{i\in I}\alpha_{i}\left\langle x-x_{i}+\frac{1}{L}g_{i}\mid y^{\star}\right\rangle \\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)-\frac{1}{2L}\|L\Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}\\ & +\left\langle \Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\mid L\Big(x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\right\rangle \\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)-\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}+L\|x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}\\ & =\sum_{i\in I}\alpha_{i}\Big(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\Big)+\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\big(x_{i}-\frac{1}{L}g_{i}\big)\Big)\|^{2}\\ & =\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{L}g_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\right).\quad\quad\quad(5) \end{aligned} y ∈ R d sup i ∈ I ∑ α i ( f i − 2 L 1 = ∥ y ∥ 2 + ∥ g i ∥ 2 − 2 ⟨ y ∣ g i ⟩ ∥ y − g i ∥ 2 + ⟨ x − x i ∣ y ⟩ ) = y ∈ R d sup i ∈ I ∑ α i ( f i − 2 L 1 { ∥ y ∥ 2 + ∥ g i ∥ 2 − 2 ⟨ y ∣ g i ⟩} + ⟨ x − x i ∣ y ⟩ ) = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) + y ∈ R d sup i ∈ I ∑ α i ( − 2 L 1 ∥ y ⋆ ∥ 2 + ⟨ y ⋆ ∣ L 1 g i ⟩ + ⟨ x − x i ∣ y ⋆ ⟩ ) = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) + y ∈ R d sup i ∈ I ∑ α i ( − 2 L 1 ∥ y ⋆ ∥ 2 + ⟨ x − x i + L 1 g i ∣ y ⋆ ⟩ ) = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) + i ∈ I ∑ α i ( − 2 L 1 ∥ y ⋆ ∥ 2 + ⟨ x − x i + L 1 g i ∣ y ⋆ ⟩ ) = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) − 2 L 1 ∥ y ⋆ ∥ 2 + i ∈ I ∑ α i ⟨ x − x i + L 1 g i ∣ y ⋆ ⟩ = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) − 2 L 1 ∥ L ( x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ∥ 2 + ⟨ ( x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ∣ L ( x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ⟩ = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) − 2 L ∥ x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ∥ 2 + L ∥ x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ∥ 2 = i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) + 2 L ∥ x − i ∈ I ∑ α i ( x i − L 1 g i ) ) ∥ 2 = 2 L ∥ x − i ∈ I ∑ α i ( x i − L 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) . ( 5 ) Using (5) in (3), we arrive at the desired interpolated function f ∈ F 0 , L ( R d ) : f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}): f ∈ F 0 , L ( R d ) :
f ( x ) = min α ∈ Δ [ L 2 ∥ x − ∑ i ∈ I α i ( x i − 1 L g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 L ∥ g i ∥ 2 ) ] , f(x)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{L}g_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2L}\|g_{i}\|^{2}\right)\right], f ( x ) = α ∈ Δ min [ 2 L ∥ x − i ∈ I ∑ α i ( x i − L 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) ] , thus completing the proof.
[1]
Taylor, Adrien B. Convex interpolation and performance estimation of first-order methods for convex optimization. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017.
[2]
Rockafellar, Tyrell. Convex Analysis. Princeton University Press, 1996.
[3]
Drori, Yoel. The exact information-based complexity of smooth convex minimization. Journal of Complexity 39 (2017): 1-16.