Table of contents
All norms are 2-norm in this blog.
A differentiable function f : R d → R f:\mathbf{R}^{d}\to\mathbf{R} f : R d → R is L L L -smooth convex if and only
( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 1 2 L ∥ ∇ f ( x ) − ∇ f ( y ) ∥ 2 . \left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}. ( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 L 1 ∥∇ f ( x ) − ∇ f ( y ) ∥ 2 . A differentiable function f : R d → R f:\mathbf{R}^{d}\to\mathbf{R} f : R d → R is ρ \rho ρ -smooth nonconvex if and only if
( ∀ x , y ∈ R d ) − ρ 2 ∥ x − y ∥ 2 ≤ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ − f ( y ) ≤ ρ 2 ∥ x − y ∥ 2 . \left(\forall x,y\in\mathbf{R}^{d}\right)\quad-\frac{\rho}{2}\|x-y\|^{2}\leq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle -f(y)\leq\frac{\rho}{2}\|x-y\|^{2}. ( ∀ x , y ∈ R d ) − 2 ρ ∥ x − y ∥ 2 ≤ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ − f ( y ) ≤ 2 ρ ∥ x − y ∥ 2 . The set of all L L L -smooth convex functions on R d \mathbf{R}^{d} R d is denoted by F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) and the set of all ρ \rho ρ -smooth nonconvex functions is denoted by F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) .
We first record the following result from [3, Lemma 2.53]
.
For any
f : R d → R f:\mathbf{R}^{d}\to\mathbf{R} f : R d → R , we have
f ∈ F − ρ , ρ ( R d ) ⇔ f + ρ 2 ∥ ⋅ ∥ 2 ∈ F 0 , 2 ρ ( R d ) . f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d})\Leftrightarrow f+\frac{\rho}{2}\|\cdot\|^{2}\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}). f ∈ F − ρ , ρ ( R d ) ⇔ f + 2 ρ ∥ ⋅ ∥ 2 ∈ F 0 , 2 ρ ( R d ) .
Next, we present the definition of an interpolable function.
Suppose we are given the set of triplets
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I where
I I I is a finite index set and
x i ∈ R d , g i ∈ R d , x_{i}\in\mathbf{R}^{d},g_{i}\in\mathbf{R}^{d}, x i ∈ R d , g i ∈ R d , and
f i ∈ R . f_{i}\in\mathbf{R}. f i ∈ R . Let
F ( R d ) \mathcal{F}(\mathbf{R}^{d}) F ( R d ) be a set of functions on
R d \mathbf{R}^{d} R d . Then the set
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is
F ( R d ) \mathcal{F}(\mathbf{R}^{d}) F ( R d ) -interpolable if and only if there exists a function
f ∈ F ( R d ) f\in\mathcal{F}(\mathbf{R}^{d}) f ∈ F ( R d ) such that for all
i ∈ I i\in I i ∈ I we have
f i = f ( x i ) f_{i}=f(x_{i}) f i = f ( x i ) and
g i ∈ ∂ f ( x i ) g_{i} \in \partial f(x_{i}) g i ∈ ∂ f ( x i ) . We say that
f ∈ F ( R d ) f\in\mathcal{F}(\mathbf{R}^{d}) f ∈ F ( R d ) is an interpolated function that interpolates the set
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I .
Next, we record the following result that follows from [1, Theorem 3.10]
regarding interpolated functions on F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) . Note that, the published version of the paper [1]
contains a typo in this theorem, which is corrected in a subsequent arxiv update available at https://arxiv.org/pdf/1512.07516.pdf .
Let
ρ > 0 \rho>0 ρ > 0 , and consider the set of triplets
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I , where
I I I is a finite index set. Then
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is
F − ρ , ρ \mathcal{F}_{-\rho,\rho} F − ρ , ρ interpolable if and only if for all
i , j ∈ I i,j\in I i , j ∈ I we have
f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 1 2 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ ( x i − x j ) − 1 ρ ( g i − g j ) ∥ 2 . f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\left\Vert (x_{i}-x_{j})-\frac{1}{\rho}(g_{i}-g_{j})\right\Vert ^{2}. f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 2 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ ∥ ( x i − x j ) − ρ 1 ( g i − g j ) ∥ ∥ 2 .
Proof.
Click to show proof
Suppose, for all i , j ∈ I i,j\in I i , j ∈ I we have
f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 1 2 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ ( x i − x j ) − 1 ρ ( g i − g j ) ∥ 2 ⇔ f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 1 2 ρ ∥ g i − g j ∥ 2 − ρ 4 ( ∥ x i − x j ∥ 2 + 1 ρ 2 ∥ g i − g j ∥ 2 − 2 1 ρ ⟨ x i − x j ∣ g i − g j ⟩ ) ⇔ f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 1 2 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ x i − x j ∥ 2 − ρ 4 1 ρ 2 ρ ∥ g i − g j ∥ 2 + 2 ρ 4 ρ 2 ⟨ x i − x j ∣ g i − g j ⟩ ⇔ f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 1 2 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ x i − x j ∥ 2 − 1 4 ρ ∥ g i − g j ∥ 2 + 1 2 ⟨ g i − g j ∣ x i − x j ⟩ ⇔ f i ≥ f j + ⟨ g j + 1 2 g i − 1 2 g j ∣ x i − x j ⟩ + 1 4 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ x i − x j ∥ 2 ⇔ f i ≥ f j + ⟨ 1 2 g j + 1 2 g i ∣ x i − x j ⟩ + 1 4 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ x i − x j ∥ 2 ⇔ f i ≥ f j + 1 2 ⟨ g i + g j ∣ x i − x j ⟩ + 1 4 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ x i − x j ∥ 2 . ( 1 ) \begin{aligned} & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\left\Vert (x_{i}-x_{j})-\frac{1}{\rho}(g_{i}-g_{j})\right\Vert ^{2}\\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\left(\|x_{i}-x_{j}\|^{2}+\frac{1}{\rho^{2}}\|g_{i}-g_{j}\|^{2}-2\frac{1}{\rho}\left\langle x_{i}-x_{j}\mid g_{i}-g_{j}\right\rangle \right)\\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}-\frac{\cancel{\rho}}{4}\frac{1}{\cancel{\rho^{2}}\rho}\|g_{i}-g_{j}\|^{2}+\frac{\cancel{2\rho}}{\cancel{4\rho}2}\left\langle x_{i}-x_{j}\mid g_{i}-g_{j}\right\rangle \\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{2\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}-\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}+\frac{1}{2}\left\langle g_{i}-g_{j}\mid x_{i}-x_{j}\right\rangle \\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle g_{j}+\frac{1}{2}g_{i}-\frac{1}{2}g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}\\ \Leftrightarrow & f_{i}\geq f_{j}+\left\langle \frac{1}{2}g_{j}+\frac{1}{2}g_{i}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}\\ \Leftrightarrow & f_{i}\geq f_{j}+\frac{1}{2}\left\langle g_{i}+g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}.\quad\quad\quad(1)\end{aligned} ⇔ ⇔ ⇔ ⇔ ⇔ ⇔ f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 2 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ ∥ ( x i − x j ) − ρ 1 ( g i − g j ) ∥ ∥ 2 f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 2 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ( ∥ x i − x j ∥ 2 + ρ 2 1 ∥ g i − g j ∥ 2 − 2 ρ 1 ⟨ x i − x j ∣ g i − g j ⟩ ) f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 2 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ x i − x j ∥ 2 − 4 ρ ρ 2 ρ 1 ∥ g i − g j ∥ 2 + 4 ρ 2 2 ρ ⟨ x i − x j ∣ g i − g j ⟩ f i ≥ f j + ⟨ g j ∣ x i − x j ⟩ + 2 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ x i − x j ∥ 2 − 4 ρ 1 ∥ g i − g j ∥ 2 + 2 1 ⟨ g i − g j ∣ x i − x j ⟩ f i ≥ f j + ⟨ g j + 2 1 g i − 2 1 g j ∣ x i − x j ⟩ + 4 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ x i − x j ∥ 2 f i ≥ f j + ⟨ 2 1 g j + 2 1 g i ∣ x i − x j ⟩ + 4 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ x i − x j ∥ 2 f i ≥ f j + 2 1 ⟨ g i + g j ∣ x i − x j ⟩ + 4 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ x i − x j ∥ 2 . ( 1 ) But from [1, Theorem 3.10]
, we have: { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F − ρ , ρ \mathcal{F}_{-\rho,\rho} F − ρ , ρ interpolable if and only if for all i , j ∈ I i,j\in I i , j ∈ I we have
f i ≥ f j + 1 2 ⟨ g i + g j ∣ x i − x j ⟩ + 1 4 ρ ∥ g i − g j ∥ 2 − ρ 4 ∥ x i − x j ∥ 2 . f_{i}\geq f_{j}+\frac{1}{2}\left\langle g_{i}+g_{j}\mid x_{i}-x_{j}\right\rangle +\frac{1}{4\rho}\|g_{i}-g_{j}\|^{2}-\frac{\rho}{4}\|x_{i}-x_{j}\|^{2}. f i ≥ f j + 2 1 ⟨ g i + g j ∣ x i − x j ⟩ + 4 ρ 1 ∥ g i − g j ∥ 2 − 4 ρ ∥ x i − x j ∥ 2 . Using this along with (1) completes the proof.
■
Next, we have the following equivalence in interpolation conditions between functions in F 0 , 2 ρ ( R d ) \mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) F 0 , 2 ρ ( R d ) and F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) via adding minimal curvature addition to the later function class.
Consider a set of triplets
{ ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R and let
ρ > 0 \rho>0 ρ > 0 . Then the following are equivalent.
(i) { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) -interpolable.
(ii) { ( x i , g i + ρ x i , f i + ρ 2 ∥ x i ∥ 2 ) } i ∈ I \{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I} {( x i , g i + ρ x i , f i + 2 ρ ∥ x i ∥ 2 ) } i ∈ I is F 0 , 2 ρ ( R d ) \mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) F 0 , 2 ρ ( R d ) -interpolable.
Proof.
Click to show proof
The proof to ( i ) ⇒ ( i i ) (i)\Rightarrow(ii) ( i ) ⇒ ( ii ) : It follows from Lemma 1 that if there exists a function f ∈ F − ρ , ρ ( R d ) f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) f ∈ F − ρ , ρ ( R d ) interpolating the set { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I , then h ≔ f + ( ρ / 2 ) ∥ ⋅ ∥ 2 h\coloneqq f+(\rho/2)\|\cdot\|^{2} h : = f + ( ρ /2 ) ∥ ⋅ ∥ 2 would satisfy h ∈ F 0 , 2 ρ ( R d ) h\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) h ∈ F 0 , 2 ρ ( R d ) and for all i ∈ I i\in I i ∈ I we would have
h ( x i ) = f ( x i ) + ρ 2 ∥ x i ∥ 2 = f i + ρ 2 ∥ x i ∥ 2 , h(x_{i})=f(x_{i})+\frac{\rho}{2}\|x_{i}\|^{2}=f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}, h ( x i ) = f ( x i ) + 2 ρ ∥ x i ∥ 2 = f i + 2 ρ ∥ x i ∥ 2 , and
∇ h ( x i ) = ∇ f ( x i ) + ρ x i = g i + ρ x i . \nabla h(x_{i})=\nabla f(x_{i})+\rho x_{i}=g_{i}+\rho x_{i}. ∇ h ( x i ) = ∇ f ( x i ) + ρ x i = g i + ρ x i . Hence, the set { ( x i , g i + ρ x i , f i + ( ρ / 2 ) ∥ x i ∥ 2 ) } i ∈ I \{(x_{i},g_{i}+\rho x_{i},f_{i}+(\rho/2)\|x_{i}\|^{2})\}_{i\in I} {( x i , g i + ρ x i , f i + ( ρ /2 ) ∥ x i ∥ 2 ) } i ∈ I is interpolated by the function h ∈ F 0 , 2 ρ ( R d ) h\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) h ∈ F 0 , 2 ρ ( R d ) .
The proof to ( i i ) ⇒ ( i ) (ii)\Rightarrow(i) ( ii ) ⇒ ( i ) : If { ( x i , g i + ρ x i , f i + ρ 2 ∥ x i ∥ 2 ) } i ∈ I \{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I} {( x i , g i + ρ x i , f i + 2 ρ ∥ x i ∥ 2 ) } i ∈ I is F 0 , 2 ρ ( R d ) \mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) F 0 , 2 ρ ( R d ) -interpolable, then there exists a h ∈ F 0 , 2 ρ ( R d ) h\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) h ∈ F 0 , 2 ρ ( R d ) , such that for all i ∈ I i\in I i ∈ I we would have
h ( x i ) = f i + ρ 2 ∥ x i ∥ 2 ⇔ f i = h ( x i ) − ρ 2 ∥ x i ∥ 2 \begin{aligned} h(x_{i}) & =f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}\\ \Leftrightarrow & f_{i}=h(x_{i})-\frac{\rho}{2}\|x_{i}\|^{2}\end{aligned} h ( x i ) ⇔ = f i + 2 ρ ∥ x i ∥ 2 f i = h ( x i ) − 2 ρ ∥ x i ∥ 2 and
∇ h ( x i ) = ∇ f ( x i ) + ρ x i = g i + ρ x i ⇔ g i = ∇ h ( x i ) − ρ x i . \begin{aligned} \nabla h(x_{i}) & =\nabla f(x_{i})+\rho x_{i}=g_{i}+\rho x_{i}\\ \Leftrightarrow & g_{i}=\nabla h(x_{i})-\rho x_{i}.\end{aligned} ∇ h ( x i ) ⇔ = ∇ f ( x i ) + ρ x i = g i + ρ x i g i = ∇ h ( x i ) − ρ x i . So, from the last two equations, we see that f ≔ h − ( ρ / 2 ) ∥ ⋅ ∥ 2 ∈ F − ρ , ρ ( R d ) f\coloneqq h-(\rho/2)\|\cdot\|^{2}\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) f : = h − ( ρ /2 ) ∥ ⋅ ∥ 2 ∈ F − ρ , ρ ( R d ) interpolates { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I . So, { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) -interpolable.
■
Finally, we present the following interpolation result due to Yoel Drori from [4]
. I showed a detailed proof of this result in the previous blog post: here .
If
{ ( x undefined i , g undefined i , f undefined i ) } i ∈ I ⊆ R d × R d × R \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R is
F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) -interpolable with
L > 0 L>0 L > 0 , then one interpolated function
f undefined ∈ F 0 , L ( R d ) \widetilde{f}\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) that interpolates
{ ( x undefined i , g undefined i , f undefined i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is:
f undefined ( y ) = min α ∈ Δ [ L 2 ∥ y − ∑ i ∈ I α i ( x undefined i − 1 L g undefined i ) ∥ 2 + ∑ i ∈ I α i ( f undefined i − 1 2 L ∥ g undefined i ∥ 2 ) ] , \widetilde{f}(y)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2L}\|\widetilde{g}_{i}\|^{2}\right)\right], f ( y ) = α ∈ Δ min [ 2 L ∥ y − i ∈ I ∑ α i ( x i − L 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) ] , where Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } . \Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}. Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } .
Now we are in a position to state our main result due to Yoel Drori and Ohad Shamir from [1, Theorem 7]
followed by its proof.
Let the set
{ ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R be
F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) -interpolable, where
L > 0 L>0 L > 0 and
I I I is a finite index set. Define:
i ⋆ ∈ argmin i ∈ I { f i − 1 2 ρ ∥ g i ∥ 2 } , i^{\star}\in\underset{i\in I}{\textrm{argmin}}\{f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\}, i ⋆ ∈ i ∈ I argmin { f i − 2 ρ 1 ∥ g i ∥ 2 } , hence,
f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 = min i ∈ I { f i − 1 2 ρ ∥ g i ∥ 2 } . f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}=\min_{i\in I}\{f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\}. f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 = i ∈ I min { f i − 2 ρ 1 ∥ g i ∥ 2 } . Then the following are equivalent.
(i) The set of triplets { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) -interpolable.
(ii) There exists a f ∈ F − ρ , ρ ( R d ) f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) f ∈ F − ρ , ρ ( R d ) that interpolates { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I and any global minimizer x ⋆ x^\star x ⋆ of f f f (i.e., x ⋆ ∈ argmin x ∈ R d f ( x ) x^\star \in \textrm{argmin}_{x \in \mathbf{R}^d} f(x) x ⋆ ∈ argmin x ∈ R d f ( x ) ) is characterized by
f ( x ⋆ ) = f ( x i ⋆ − 1 L g i ⋆ ) ≤ f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 . ( 3 ) f(x^\star) = f(x_{i^{\star}}-\frac{1}{L}g_{i^{\star}}) \leq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}.\quad\quad\quad(3) f ( x ⋆ ) = f ( x i ⋆ − L 1 g i ⋆ ) ≤ f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 . ( 3 )
Proof to Theorem 4 .
First, we prove ( i i ) ⇒ ( i ) (ii)\Rightarrow(i) ( ii ) ⇒ ( i ) . If ( i i ) (ii) ( ii ) holds, then obviously f f f interpolates { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I , hence { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) -interpolable by Definition 1 .
Next, we prove ( i ) ⇒ ( i i ) (i)\Rightarrow(ii) ( i ) ⇒ ( ii ) . We want construct an interpolated function f ∈ F − ρ , ρ ( R d ) f\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) f ∈ F − ρ , ρ ( R d ) satisfying ( i i ) (ii) ( ii ) . We have { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F − ρ , ρ ( R d ) \mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) F − ρ , ρ ( R d ) -interpolable, but due to Theorem 2 , this is equivalent to saying { ( x i , g i + ρ x i , f i + ρ 2 ∥ x i ∥ 2 ) } i ∈ I \{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I} {( x i , g i + ρ x i , f i + 2 ρ ∥ x i ∥ 2 ) } i ∈ I is F 0 , 2 ρ ( R d ) \mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) F 0 , 2 ρ ( R d ) -interpolable. Now, in Theorem 3 , setting
{ ( x undefined i , g undefined i , f undefined i ) } i ∈ I ≔ { ( x i , g i + ρ x i , f i + ρ 2 ∥ x i ∥ 2 ) } i ∈ I , and L ≔ 2 ρ , \begin{aligned} \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I} & \coloneqq\{(x_{i},g_{i}+\rho x_{i},f_{i}+\frac{\rho}{2}\|x_{i}\|^{2})\}_{i\in I},\textrm{ and }\\ L & \coloneqq2\rho,\end{aligned} {( x i , g i , f i ) } i ∈ I L : = {( x i , g i + ρ x i , f i + 2 ρ ∥ x i ∥ 2 ) } i ∈ I , and : = 2 ρ , we have the function f undefined ∈ F 0 , 2 ρ ( R d ) \widetilde{f}\in\mathcal{F}_{0,2\rho}(\mathbf{R}^{d}) f ∈ F 0 , 2 ρ ( R d ) interpolating { ( x undefined i , g undefined i , f undefined i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I defined by:
f undefined ( y ) = min α ∈ Δ [ L 2 ∥ y − ∑ i ∈ I α i ( x undefined i − 1 L g undefined i ) ∥ 2 + ∑ i ∈ I α i ( f undefined i − 1 2 L ∥ g undefined i ∥ 2 ) ] = min α ∈ Δ [ ρ ∥ y − ∑ i ∈ I α i ( x i − 1 2 ρ ( g i + ρ x i ) ) ∥ 2 + ∑ i ∈ I α i ( f i + ρ 2 ∥ x i ∥ 2 − 1 4 ρ ∥ g i + ρ x i ∥ 2 ) ] , \begin{aligned} \widetilde{f}(y) & =\min_{\alpha\in\Delta}\left[\frac{L}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2L}\|\widetilde{g}_{i}\|^{2}\right)\right]\\ & =\min_{\alpha\in\Delta}\left[\rho\left\Vert y-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{2\rho}(g_{i}+\rho x_{i})\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}-\frac{1}{4\rho}\|g_{i}+\rho x_{i}\|^{2}\right)\right],\end{aligned} f ( y ) = α ∈ Δ min [ 2 L ∥ y − i ∈ I ∑ α i ( x i − L 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) ] = α ∈ Δ min ⎣ ⎡ ρ ∥ ∥ y − i ∈ I ∑ α i ( x i − 2 ρ 1 ( g i + ρ x i ) ) ∥ ∥ 2 + i ∈ I ∑ α i ( f i + 2 ρ ∥ x i ∥ 2 − 4 ρ 1 ∥ g i + ρ x i ∥ 2 ) ⎦ ⎤ ,
where Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } . \Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}. Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } .
Now, due to the proof ( i i ) ⇒ ( i ) (ii)\Rightarrow(i) ( ii ) ⇒ ( i ) of Theorem 2 , the function f ≔ f undefined − ( ρ / 2 ) ∥ ⋅ ∥ 2 ∈ F − ρ , ρ ( R d ) f\coloneqq \widetilde{f}-(\rho/2)\|\cdot\|^{2}\in\mathcal{F}_{-\rho,\rho}(\mathbf{R}^{d}) f : = f − ( ρ /2 ) ∥ ⋅ ∥ 2 ∈ F − ρ , ρ ( R d ) will interpolate { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I . Hence, we have:
f ( x ) = f undefined ( x ) − ( ρ / 2 ) ∥ x ∥ 2 = min α ∈ Δ [ ρ ∥ x − ∑ i ∈ I α i ( x i − 1 2 ρ ( g i + ρ x i ) ) ∥ 2 + ∑ i ∈ I α i ( f i + ρ 2 ∥ x i ∥ 2 − 1 4 ρ ∥ g i + ρ x i ∥ 2 ) ] − ρ 2 ∥ x ∥ 2 = min α ∈ Δ [ − ρ 2 ∥ x ∥ 2 + ρ ∥ x − ∑ i ∈ I α i ( x i − 1 2 ρ ( g i + ρ x i ) ) ∥ 2 + ∑ i ∈ I α i ( f i + ρ 2 ∥ x i ∥ 2 − 1 4 ρ ∥ g i + ρ x i ∥ 2 ) ] = min α ∈ Δ [ ρ 2 ∥ x − ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 − ρ 4 ∥ ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) ] , \begin{aligned} f(x) & =\widetilde{f}(x)-(\rho/2)\|x\|^{2}\\ & =\min_{\alpha\in\Delta}\left[\rho\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{2\rho}(g_{i}+\rho x_{i})\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}-\frac{1}{4\rho}\|g_{i}+\rho x_{i}\|^{2}\right)\right]-\frac{\rho}{2}\|x\|^{2}\\ & =\min_{\alpha\in\Delta}\left[-\frac{\rho}{2}\|x\|^{2}+\rho\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{2\rho}(g_{i}+\rho x_{i})\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{\rho}{2}\|x_{i}\|^{2}-\frac{1}{4\rho}\|g_{i}+\rho x_{i}\|^{2}\right)\right]\\ & =\min_{\alpha\in\Delta}\left[\frac{\rho}{2}\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{\rho}g_{i}\right)\right\Vert ^{2}-\frac{\rho}{4}\left\Vert \sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{\rho}g_{i}\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}\right)\right],\end{aligned} f ( x ) = f ( x ) − ( ρ /2 ) ∥ x ∥ 2 = α ∈ Δ min ⎣ ⎡ ρ ∥ ∥ x − i ∈ I ∑ α i ( x i − 2 ρ 1 ( g i + ρ x i ) ) ∥ ∥ 2 + i ∈ I ∑ α i ( f i + 2 ρ ∥ x i ∥ 2 − 4 ρ 1 ∥ g i + ρ x i ∥ 2 ) ⎦ ⎤ − 2 ρ ∥ x ∥ 2 = α ∈ Δ min ⎣ ⎡ − 2 ρ ∥ x ∥ 2 + ρ ∥ ∥ x − i ∈ I ∑ α i ( x i − 2 ρ 1 ( g i + ρ x i ) ) ∥ ∥ 2 + i ∈ I ∑ α i ( f i + 2 ρ ∥ x i ∥ 2 − 4 ρ 1 ∥ g i + ρ x i ∥ 2 ) ⎦ ⎤ = α ∈ Δ min ⎣ ⎡ 2 ρ ∥ ∥ x − i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ ∥ 2 − 4 ρ ∥ ∥ i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 + 4 ρ ∥ ∥ x i − ρ 1 g i ∥ ∥ 2 ) ⎦ ⎤ ,
where the last line follows from the following algebraic simplification (click to expand)
Click to expand
Hence, we have the interpolated function f f f satisfying:
f ( x ) = min α ∈ Δ [ ρ 2 ∥ x − ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 − ρ 4 ∥ ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) ] ( 4 ) ≥ a ) min α ∈ Δ − ρ 4 ∥ ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) ≥ b ) min α ∈ Δ − ρ 4 ∑ i ∈ I α i ∥ x i − 1 ρ g i ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) = c ) min α ∈ Δ − ρ 4 ∑ i ∈ I α i ∥ x i − 1 ρ g i ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 ) + ρ 4 ∑ i ∈ I α i ∥ x i − 1 ρ g i ∥ 2 = min α ∈ Δ ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 ) = d ) min i ∈ I { f i − 1 2 ρ ∥ g i ∥ 2 } = f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 \begin{aligned} & f(x)\\ = & \min_{\alpha\in\Delta}\Big[\frac{\rho}{2}\|x-\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}-\frac{\rho}{4}\|\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}\\ & +\sum_{i\in I}\alpha_{i}(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\Big]\quad\quad\quad(4)\\ \overset{a)}{\geq} & \min_{\alpha\in\Delta}-\frac{\rho}{4}\left\Vert \sum_{i\in I}\alpha_{i}\left(x_{i}-\frac{1}{\rho}g_{i}\right)\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}\right)\\ \overset{b)}{\geq} & \min_{\alpha\in\Delta}-\frac{\rho}{4}\sum_{i\in I}\alpha_{i}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}\right)\\ \overset{c)}{=} & \min_{\alpha\in\Delta}\cancel{-\frac{\rho}{4}\sum_{i\in I}\alpha_{i}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}}+\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\right)+\cancel{\frac{\rho}{4}\sum_{i\in I}\alpha_{i}\left\Vert x_{i}-\frac{1}{\rho}g_{i}\right\Vert ^{2}}\\ = & \min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}\left(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\right)\\ \overset{d)}{=} & \min_{i\in I}\{f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}\}\\ = & f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2} \end{aligned} = ≥ a ) ≥ b ) = c ) = = d ) = f ( x ) α ∈ Δ min [ 2 ρ ∥ x − i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ 2 − 4 ρ ∥ i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 + 4 ρ ∥ x i − ρ 1 g i ∥ 2 ) ] ( 4 ) α ∈ Δ min − 4 ρ ∥ ∥ i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 + 4 ρ ∥ ∥ x i − ρ 1 g i ∥ ∥ 2 ) α ∈ Δ min − 4 ρ i ∈ I ∑ α i ∥ ∥ x i − ρ 1 g i ∥ ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 + 4 ρ ∥ ∥ x i − ρ 1 g i ∥ ∥ 2 ) α ∈ Δ min − 4 ρ i ∈ I ∑ α i ∥ ∥ x i − ρ 1 g i ∥ ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 ) + 4 ρ i ∈ I ∑ α i ∥ ∥ x i − ρ 1 g i ∥ ∥ 2 α ∈ Δ min i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 ) i ∈ I min { f i − 2 ρ 1 ∥ g i ∥ 2 } f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 where in a ) a) a ) we construct a smaller term by removing the non-negative term ( ρ / 2 ) ∥ x − ∑ i ∈ I α i ( x i − ( 1 / ρ ) g i ) ∥ 2 (\rho/2)\left\Vert x-\sum_{i\in I}\alpha_{i}\left(x_{i}-(1/\rho)g_{i}\right)\right\Vert ^{2} ( ρ /2 ) ∥ ∥ x − ∑ i ∈ I α i ( x i − ( 1/ ρ ) g i ) ∥ ∥ 2 from the previous line, b ) b) b ) uses Jensen's inequality
∥ ∑ i ∈ I λ i y i ∥ 2 ≤ ∑ i ∈ I λ i ∥ y i ∥ 2 : where λ ≥ 0 , ∑ i ∈ I λ i = 1 , \|\sum_{i\in I}\lambda_{i}y_{i}\|^{2}\leq\sum_{i\in I}\lambda_{i}\|y_{i}\|^{2}:\textrm{where }\lambda\geq0,\sum_{i\in I}\lambda_{i}=1, ∥ i ∈ I ∑ λ i y i ∥ 2 ≤ i ∈ I ∑ λ i ∥ y i ∥ 2 : where λ ≥ 0 , i ∈ I ∑ λ i = 1 , c ) c) c ) just expands and cancels, and d ) d) d ) uses the identity min i ∈ I a i = min α ∈ Δ ∑ i ∈ I α i a i . \min_{i\in I}a_{i}=\min_{\alpha\in\Delta}\sum_{i\in I}\alpha_{i}a_{i}. min i ∈ I a i = min α ∈ Δ ∑ i ∈ I α i a i . So, we have shown that, for all x ∈ R d x\in\mathbf{R}^{d} x ∈ R d we have
f ( x ) ≥ f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 , ( 5 ) f(x)\geq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2},\quad\quad\quad(5) f ( x ) ≥ f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 , ( 5 ) i.e. , f i ⋆ − ( 1 / 2 ρ ) ∥ g i ⋆ ∥ 2 f_{i^{\star}}-(1/2\rho)\|g_{i^{\star}}\|^{2} f i ⋆ − ( 1/2 ρ ) ∥ g i ⋆ ∥ 2 is a global lower bound for the interpolated function f f f . So, if we can show that some point x ⋆ x^{\star} x ⋆ satisfies
f ( x ⋆ ) ≤ f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 , f(x^{\star})\leq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}, f ( x ⋆ ) ≤ f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 , then x ⋆ x^{\star} x ⋆ must be the global minimizer of f f f . Now, if we set α ≔ e i ⋆ \alpha\coloneqq e_{i^{\star}} α : = e i ⋆ (i ⋆ i^{\star} i ⋆ -th unit vector with its i ⋆ i^{\star} i ⋆ -th component 1 and rest 0 0 0 ) and x ≔ x i ⋆ − ( 1 / ρ ) g i ⋆ x\coloneqq x_{i^{\star}}-(1/\rho)g_{i^{\star}} x : = x i ⋆ − ( 1/ ρ ) g i ⋆ in ( 4 ) , (4), ( 4 ) , we have an upper bound of f f f , i.e. ,
f ( x i ⋆ − ( 1 / ρ ) g i ⋆ ) = min α ∈ Δ [ ρ 2 ∥ x i ⋆ − ( 1 / ρ ) g i ⋆ − ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 − ρ 4 ∥ ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) ] ≤ [ ρ 2 ∥ x i ⋆ − ( 1 / ρ ) g i ⋆ − ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 − ρ 4 ∥ ∑ i ∈ I α i ( x i − 1 ρ g i ) ∥ 2 + ∑ i ∈ I α i ( f i − 1 2 ρ ∥ g i ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) ] α = e i ⋆ = ρ 2 ∥ ( x i ⋆ − ( 1 / ρ ) g i ⋆ ) − ( x i ⋆ − 1 ρ g i ⋆ ) ∥ 2 − ρ 4 ∥ ( x i ⋆ − 1 ρ g i ⋆ ) ∥ 2 + ( f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 + ρ 4 ∥ x i − 1 ρ g i ∥ 2 ) = − ρ 4 ∥ ( x i ⋆ − 1 ρ g i ⋆ ) ∥ 2 + ( f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 ) + ρ 4 ∥ x i − 1 ρ g i ∥ 2 = f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 ( 6 ) . \begin{aligned} & f(x_{i^{\star}}-(1/\rho)g_{i^{\star}})\\ = & \min_{\alpha\in\Delta}\Big[\frac{\rho}{2}\|x_{i^{\star}}-(1/\rho)g_{i^{\star}}-\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}-\frac{\rho}{4}\|\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}+\sum_{i\in I}\alpha_{i}(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\Big]\\ \leq & \Big[\frac{\rho}{2}\|x_{i^{\star}}-(1/\rho)g_{i^{\star}}-\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}-\frac{\rho}{4}\|\sum_{i\in I}\alpha_{i}(x_{i}-\frac{1}{\rho}g_{i})\|^{2}+\sum_{i\in I}\alpha_{i}(f_{i}-\frac{1}{2\rho}\|g_{i}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\Big]_{\alpha=e_{i^{\star}}}\\ = & \frac{\rho}{2}\|\cancel{(x_{i^{\star}}-(1/\rho)g_{i^{\star}})}-\cancel{(x_{i^{\star}}-\frac{1}{\rho}g_{i^{\star}})}\|^{2}-\frac{\rho}{4}\|(x_{i^{\star}}-\frac{1}{\rho}g_{i^{\star}})\|^{2}+(f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}+\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2})\\ = & \cancel{-\frac{\rho}{4}\|(x_{i^{\star}}-\frac{1}{\rho}g_{i^{\star}})\|^{2}}+(f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2})+\cancel{\frac{\rho}{4}\|x_{i}-\frac{1}{\rho}g_{i}\|^{2}}\\ = & f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}\quad\quad\quad(6).\end{aligned} = ≤ = = = f ( x i ⋆ − ( 1/ ρ ) g i ⋆ ) α ∈ Δ min [ 2 ρ ∥ x i ⋆ − ( 1/ ρ ) g i ⋆ − i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ 2 − 4 ρ ∥ i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 + 4 ρ ∥ x i − ρ 1 g i ∥ 2 ) ] [ 2 ρ ∥ x i ⋆ − ( 1/ ρ ) g i ⋆ − i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ 2 − 4 ρ ∥ i ∈ I ∑ α i ( x i − ρ 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 ρ 1 ∥ g i ∥ 2 + 4 ρ ∥ x i − ρ 1 g i ∥ 2 ) ] α = e i ⋆ 2 ρ ∥ ( x i ⋆ − ( 1/ ρ ) g i ⋆ ) − ( x i ⋆ − ρ 1 g i ⋆ ) ∥ 2 − 4 ρ ∥ ( x i ⋆ − ρ 1 g i ⋆ ) ∥ 2 + ( f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 + 4 ρ ∥ x i − ρ 1 g i ∥ 2 ) − 4 ρ ∥ ( x i ⋆ − ρ 1 g i ⋆ ) ∥ 2 + ( f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 ) + 4 ρ ∥ x i − ρ 1 g i ∥ 2 f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 ( 6 ) . So the lower bound on f f f in ( 5 ) (5) ( 5 ) is achieved by x i ⋆ − ( 1 / ρ ) g i ⋆ x_{i^{\star}}-(1/\rho)g_{i^{\star}} x i ⋆ − ( 1/ ρ ) g i ⋆ . Hence, any global minimizer x ⋆ x^\star x ⋆ of f f f (i.e., x ⋆ ∈ argmin x ∈ R d f ( x ) x^\star \in \textrm{argmin}_{x \in \mathbf{R}^d} f(x) x ⋆ ∈ argmin x ∈ R d f ( x ) ) is characterized by
f ( x ⋆ ) = f ( x i ⋆ − 1 L g i ⋆ ) ≤ f i ⋆ − 1 2 ρ ∥ g i ⋆ ∥ 2 , f(x^\star) = f(x_{i^{\star}}-\frac{1}{L}g_{i^{\star}}) \leq f_{i^{\star}}-\frac{1}{2\rho}\|g_{i^{\star}}\|^{2}, f ( x ⋆ ) = f ( x i ⋆ − L 1 g i ⋆ ) ≤ f i ⋆ − 2 ρ 1 ∥ g i ⋆ ∥ 2 , and this completes the proof ■
[1]
Drori, Y., & Shamir, O. (2020, November). The complexity of finding stationary points with stochastic gradient descent. In International Conference on Machine Learning (pp. 2658-2667). PMLR. Link
[2]
Adrien B Taylor, Julien M Hendrickx, and Fran¸cois Glineur. Exact worst-case performance of first-order methods for composite convex optimization. SIAM Journal on Optimization, 27(3):1283–1313, 2017. Link
[3]
Taylor, Adrien B. Convex interpolation and performance estimation of first-order methods for convex optimization. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017. Link
[4]
Drori, Yoel. The exact information-based complexity of smooth convex minimization. Journal of Complexity 39 (2017): 1-16. Link