Table of contents
All norms are 2-norm in this blog. A function f : R d → R f:\mathbf{R}^{d}\to\mathbf{R} f : R d → R is L L L -smooth convex if and only
( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 1 2 L ∥ ∇ f ( x ) − ∇ f ( y ) ∥ 2 . \left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{1}{2L}\|\nabla f(x)-\nabla f(y)\|^{2}. ( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 L 1 ∥∇ f ( x ) − ∇ f ( y ) ∥ 2 . On the other hand, a function is μ \mu μ -strongly convex if and only if
( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + μ 2 ∥ x − y ∥ 2 ( SCVX ) \left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\geq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{\mu}{2}\|x-y\|^{2}\qquad (\text{SCVX}) ( ∀ x , y ∈ R d ) f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 μ ∥ x − y ∥ 2 ( SCVX ) where f ′ ( ⋅ ) f^{\prime}(\cdot) f ′ ( ⋅ ) denotes a subgradient of f f f at ( ⋅ ) (\cdot) ( ⋅ ) .
On R d , \mathbf{R}^{d}, R d , the set of all L L L -smooth convex functions is denoted by F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) , the set of all μ \mu μ -strongly convex functions is denoted by F μ , ∞ ( R d ) \mathcal{F}_{\mu,\infty}(\mathbf{R}^{d}) F μ , ∞ ( R d ) , and the set of all μ \mu μ -strongly convex and L L L -smooth functions is denoted by F μ , L ( R d ) \mathcal{F}_{\mu,L}(\mathbf{R}^{d}) F μ , L ( R d ) . Finally the set of all lower-semicontinuous, proper, and convex functions is denoted by F 0 , ∞ ( R d ) \mathcal{F}_{0,\infty}(\mathbf{R}^{d}) F 0 , ∞ ( R d ) .
An alternative characterization of f ∈ F 0 , L ( R d ) f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) , that we will use multiple times, is as follows.
For a function f : R d → R f:\mathbf{R}^{d}\to\mathbf{R} f : R d → R the following statements are equivalent:
(a) f ∈ F 0 , L ( R d ) f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) .
(b)f f f is convex and it satisfies ( ∀ x , y ∈ R d ) f ( y ) ≤ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + L 2 ∥ x − y ∥ 2 . \left(\forall x,y\in\mathbf{R}^{d}\right)\quad f(y)\leq f(x)+\langle\nabla f(x)\mid y-x\rangle+\frac{L}{2}\|x-y\|^{2}. ( ∀ x , y ∈ R d ) f ( y ) ≤ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 L ∥ x − y ∥ 2 .
(c) f f f is convex and satisfies ( ∀ x , y ∈ R d ) ∥ ∇ f ( x ) − ∇ f ( y ) ∥ ≤ L ∥ x − y ∥ \left(\forall x,y\in\mathbf{R}^{d}\right)\quad\|\nabla f(x)-\nabla f(y)\|\leq L\|x-y\| ( ∀ x , y ∈ R d ) ∥∇ f ( x ) − ∇ f ( y ) ∥ ≤ L ∥ x − y ∥ .
(d) L 2 ∥ ⋅ ∥ 2 − f ∈ F 0 , ∞ ( R d ) \frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,\infty}(\mathbf{R}^{d}) 2 L ∥ ⋅ ∥ 2 − f ∈ F 0 , ∞ ( R d ) .
In ( b ) (b) ( b ) and ( c ) (c) ( c ) of the theorem above, saying that f f f is convex is necessary, as the inequalities only in ( b ) , ( c ) (b),(c) ( b ) , ( c ) are not sufficient to establish L L L -smooth convexity.
We now record the notion of an interpolable function.
Suppose we are given the set of triplets
{ ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R where
I I I is a finite index set. Let
F ( R d ) \mathcal{F}(\mathbf{R}^d) F ( R d ) be some class of functions. Then the set
{ ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i} ) \}_{i\in I} {( x i , g i , f i ) } i ∈ I is
F ( R d ) \mathcal{F}(\mathbf{R}^{d}) F ( R d ) -interpolable if and only if there exists a function
f ∈ F ( R d ) f\in\mathcal{F}(\mathbf{R}^{d}) f ∈ F ( R d ) such that for all
i ∈ I i\in I i ∈ I we have
f i = f ( x i ) f_{i}=f(x_{i}) f i = f ( x i ) and
g i ∈ ∂ f ( x i ) g_{i} \in \partial f(x_{i}) g i ∈ ∂ f ( x i ) .
To establish our main result, we will use the following equivalence result.
We have f ∈ F μ , L ( R d ) ⇔ L 2 ∥ ⋅ ∥ 2 − f ∈ F 0 , L − μ ( R d ) f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})\Leftrightarrow\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) f ∈ F μ , L ( R d ) ⇔ 2 L ∥ ⋅ ∥ 2 − f ∈ F 0 , L − μ ( R d ) .
Proof.
Click to show proof
First, we prove f ∈ F μ , L ( R d ) ⇒ L 2 ∥ ⋅ ∥ 2 − f ∈ F 0 , L − μ ( R d ) f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d})\Rightarrow\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) f ∈ F μ , L ( R d ) ⇒ 2 L ∥ ⋅ ∥ 2 − f ∈ F 0 , L − μ ( R d ) . Define h ≔ L 2 ∥ ⋅ ∥ 2 − f h\coloneqq\frac{L}{2}\|\cdot\|^{2}-f h : = 2 L ∥ ⋅ ∥ 2 − f , so f = L 2 ∥ ⋅ ∥ 2 − h f=\frac{L}{2}\|\cdot\|^{2}-h f = 2 L ∥ ⋅ ∥ 2 − h and ∇ f ( x ) = L x − ∇ h ( x ) \nabla f(x)=Lx-\nabla h(x) ∇ f ( x ) = Lx − ∇ h ( x ) . Clearly, f ∈ F 0 , L ( R d ) f\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) also, so due to Theorem 1 ( a ) , ( d ) , (a),(d), ( a ) , ( d ) , we have h ∈ F 0 , ∞ ( R d ) h\in\mathcal{F}_{0,\infty}(\mathbf{R}^{d}) h ∈ F 0 , ∞ ( R d ) . Because f f f is μ \mu μ -strongly convex and differentiable, for any x , y ∈ R d x,y\in\mathbf{R}^{d} x , y ∈ R d we have from ( SCVX ) (\text{SCVX}) ( SCVX ) :
f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + μ 2 ∥ x − y ∥ 2 ⇔ L 2 ∥ y ∥ 2 − h ( y ) ≥ L 2 ∥ x ∥ 2 − h ( x ) + ⟨ L x − ∇ h ( x ) ∣ y − x ⟩ + μ 2 ∥ x − y ∥ 2 ⇔ − L 2 ∥ y ∥ 2 + h ( y ) ≤ − L 2 ∥ x ∥ 2 + h ( x ) − ⟨ L x − ∇ h ( x ) ∣ y − x ⟩ − μ 2 ∥ x − y ∥ 2 ⇔ h ( y ) ≤ L 2 ∥ y ∥ 2 − L 2 ∥ x ∥ 2 + h ( x ) − L ⟨ x ∣ y − x ⟩ + ⟨ ∇ h ( x ) ∣ y − x ⟩ − μ 2 ∥ x − y ∥ 2 = L 2 ∥ y ∥ 2 − L 2 ∥ x ∥ 2 + h ( x ) − L ⟨ x ∣ y ⟩ + L ∥ x ∥ 2 + ⟨ ∇ h ( x ) ∣ y − x ⟩ − μ 2 ∥ x − y ∥ 2 = L 2 ( ∥ y ∥ 2 + ∥ x ∥ 2 − 2 ⟨ x ∣ y ⟩ ) + h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ − μ 2 ∥ x − y ∥ 2 = h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + L 2 ∥ x − y ∥ 2 − μ 2 ∥ x − y ∥ 2 = h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + L − μ 2 ∥ x − y ∥ 2 . \begin{aligned} & f(y)\geq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & \frac{L}{2}\|y\|^{2}-h(y)\geq\frac{L}{2}\|x\|^{2}-h(x)+\left\langle Lx-\nabla h(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & -\frac{L}{2}\|y\|^{2}+h(y)\leq-\frac{L}{2}\|x\|^{2}+h(x)-\left\langle Lx-\nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & h(y)\leq\frac{L}{2}\|y\|^{2}-\frac{L}{2}\|x\|^{2}+h(x)-L\left\langle x\mid y-x\right\rangle +\left\langle \nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=\frac{L}{2}\|y\|^{2}-\frac{L}{2}\|x\|^{2}+h(x)-L\left\langle x\mid y\right\rangle +L\|x\|^{2}+\left\langle \nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=\frac{L}{2}\left(\|y\|^{2}+\|x\|^{2}-2\left\langle x\mid y\right\rangle \right)+h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle -\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L}{2}\|x-y\|^{2}-\frac{\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}.\end{aligned} ⇔ ⇔ ⇔ f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 μ ∥ x − y ∥ 2 2 L ∥ y ∥ 2 − h ( y ) ≥ 2 L ∥ x ∥ 2 − h ( x ) + ⟨ Lx − ∇ h ( x ) ∣ y − x ⟩ + 2 μ ∥ x − y ∥ 2 − 2 L ∥ y ∥ 2 + h ( y ) ≤ − 2 L ∥ x ∥ 2 + h ( x ) − ⟨ Lx − ∇ h ( x ) ∣ y − x ⟩ − 2 μ ∥ x − y ∥ 2 h ( y ) ≤ 2 L ∥ y ∥ 2 − 2 L ∥ x ∥ 2 + h ( x ) − L ⟨ x ∣ y − x ⟩ + ⟨ ∇ h ( x ) ∣ y − x ⟩ − 2 μ ∥ x − y ∥ 2 = 2 L ∥ y ∥ 2 − 2 L ∥ x ∥ 2 + h ( x ) − L ⟨ x ∣ y ⟩ + L ∥ x ∥ 2 + ⟨ ∇ h ( x ) ∣ y − x ⟩ − 2 μ ∥ x − y ∥ 2 = 2 L ( ∥ y ∥ 2 + ∥ x ∥ 2 − 2 ⟨ x ∣ y ⟩ ) + h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ − 2 μ ∥ x − y ∥ 2 = h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + 2 L ∥ x − y ∥ 2 − 2 μ ∥ x − y ∥ 2 = h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + 2 L − μ ∥ x − y ∥ 2 . So we have proven that h h h is convex and for any x , y ∈ R d , x,y\in\mathbf{R}^{d}, x , y ∈ R d ,
h ( y ) ≤ h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + L − μ 2 ∥ x − y ∥ 2 h(y)\leq h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2} h ( y ) ≤ h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + 2 L − μ ∥ x − y ∥ 2 which this is equivalent to saying that h ∈ F 0 , L − μ ( R d ) h\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) h ∈ F 0 , L − μ ( R d ) due to Theorem 1 ( a ) , ( b ) (a),(b) ( a ) , ( b ) .
Next, we prove h ≔ L 2 ∥ ⋅ ∥ 2 − f ∈ F 0 , L − μ ( R d ) ⇒ f ∈ F μ , L ( R d ) h\coloneqq\frac{L}{2}\|\cdot\|^{2}-f\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d})\Rightarrow f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}) h : = 2 L ∥ ⋅ ∥ 2 − f ∈ F 0 , L − μ ( R d ) ⇒ f ∈ F μ , L ( R d ) . Due to Theorem 1 ( a ) , ( b ) (a),(b) ( a ) , ( b ) , we have: h h h convex and for any x , y ∈ R d , x,y\in\mathbf{R}^{d}, x , y ∈ R d ,
h ( y ) ≤ h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + L − μ 2 ∥ x − y ∥ 2 ⇔ L 2 ∥ y ∥ 2 − f ( y ) ≤ L 2 ∥ x ∥ 2 − f ( x ) + ⟨ L x − ∇ f ( x ) ∣ y − x ⟩ + L − μ 2 ∥ x − y ∥ 2 ⇔ − L 2 ∥ y ∥ 2 + f ( y ) ≥ − L 2 ∥ x ∥ 2 + f ( x ) − ⟨ L x − ∇ f ( x ) ∣ y − x ⟩ − L − μ 2 ∥ x − y ∥ 2 ⇔ f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + L 2 ∥ y ∥ 2 − L 2 ∥ x ∥ 2 − L ⟨ x ∣ y ⟩ + L ∥ x ∥ 2 − L − μ 2 ∥ x − y ∥ 2 = f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + L 2 ( ∥ y ∥ 2 + ∥ x ∥ 2 − 2 ⟨ x ∣ y ⟩ ) undefined = ∥ x − y ∥ 2 − L − μ 2 ∥ x − y ∥ 2 = f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + μ 2 ∥ x − y ∥ 2 , \begin{aligned} & h(y)\leq h(x)+\left\langle \nabla h(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & \frac{L}{2}\|y\|^{2}-f(y)\leq\frac{L}{2}\|x\|^{2}-f(x)+\left\langle Lx-\nabla f(x)\mid y-x\right\rangle +\frac{L-\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & -\frac{L}{2}\|y\|^{2}+f(y)\geq-\frac{L}{2}\|x\|^{2}+f(x)-\left\langle Lx-\nabla f(x)\mid y-x\right\rangle -\frac{L-\mu}{2}\|x-y\|^{2}\\ \Leftrightarrow & f(y)\geq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{L}{2}\|y\|^{2}-\frac{L}{2}\|x\|^{2}-L\left\langle x\mid y\right\rangle +L\|x\|^{2}-\frac{L-\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{L}{2}\underbrace{\left(\|y\|^{2}+\|x\|^{2}-2\left\langle x\mid y\right\rangle \right)}_{=\|x-y\|^{2}}-\frac{L-\mu}{2}\|x-y\|^{2}\\ & \quad\quad\quad=f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2},\end{aligned} ⇔ ⇔ ⇔ h ( y ) ≤ h ( x ) + ⟨ ∇ h ( x ) ∣ y − x ⟩ + 2 L − μ ∥ x − y ∥ 2 2 L ∥ y ∥ 2 − f ( y ) ≤ 2 L ∥ x ∥ 2 − f ( x ) + ⟨ Lx − ∇ f ( x ) ∣ y − x ⟩ + 2 L − μ ∥ x − y ∥ 2 − 2 L ∥ y ∥ 2 + f ( y ) ≥ − 2 L ∥ x ∥ 2 + f ( x ) − ⟨ Lx − ∇ f ( x ) ∣ y − x ⟩ − 2 L − μ ∥ x − y ∥ 2 f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 L ∥ y ∥ 2 − 2 L ∥ x ∥ 2 − L ⟨ x ∣ y ⟩ + L ∥ x ∥ 2 − 2 L − μ ∥ x − y ∥ 2 = f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 L = ∥ x − y ∥ 2 ( ∥ y ∥ 2 + ∥ x ∥ 2 − 2 ⟨ x ∣ y ⟩ ) − 2 L − μ ∥ x − y ∥ 2 = f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 μ ∥ x − y ∥ 2 , i.e. , we have shown that for all x , y ∈ R d x,y\in\mathbf{R}^{d} x , y ∈ R d
f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + μ 2 ∥ x − y ∥ 2 , f(y)\geq f(x)+\left\langle \nabla f(x)\mid y-x\right\rangle +\frac{\mu}{2}\|x-y\|^{2}, f ( y ) ≥ f ( x ) + ⟨ ∇ f ( x ) ∣ y − x ⟩ + 2 μ ∥ x − y ∥ 2 , hence by defintion f f f is μ \mu μ -strongly convex. Finally, we have
∥ ∇ f ( x ) − ∇ f ( y ) ∥ = ∥ L x − ∇ h ( x ) − L y + ∇ h ( y ) ∥ = ∥ L ( x − y ) + ( ∇ h ( y ) − ∇ h ( x ) ) ∥ ≤ a ) L ∥ x − y ∥ + ∥ ∇ h ( y ) − ∇ h ( x ) ∥ ≤ b ) L ∥ x − y ∥ + ( L − μ ) ∥ x − y ∥ = L ∥ x − y ∥ , \begin{aligned} \|\nabla f(x)-\nabla f(y)\| & =\|Lx-\nabla h(x)-Ly+\nabla h(y)\|\\ & =\|L(x-y)+(\nabla h(y)-\nabla h(x))\|\\ & \overset{a)}{\leq}L\|x-y\|+\|\nabla h(y)-\nabla h(x)\|\\ & \overset{b)}{\leq}L\|x-y\|+(L-\mu)\|x-y\|\\ & =L\|x-y\|,\end{aligned} ∥∇ f ( x ) − ∇ f ( y ) ∥ = ∥ Lx − ∇ h ( x ) − L y + ∇ h ( y ) ∥ = ∥ L ( x − y ) + ( ∇ h ( y ) − ∇ h ( x )) ∥ ≤ a ) L ∥ x − y ∥ + ∥∇ h ( y ) − ∇ h ( x ) ∥ ≤ b ) L ∥ x − y ∥ + ( L − μ ) ∥ x − y ∥ = L ∥ x − y ∥ , where a ) a) a ) follows from triangle inequallity and b ) b) b ) follows from Theorem 1 ( a ) , ( c ) (a),(c) ( a ) , ( c ) . So, f f f is L L L -smooth besides being μ \mu μ -strongly convex, i.e. , f ∈ F μ , L ( R d ) f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}) f ∈ F μ , L ( R d ) . This completes the proof.
■
Also, we are going to use the following interpolation result.
Suppose we are given the set of triplets set of triplets { ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R where I I I is a finite index set. Then the following are equivalent.
(a) { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is F μ , L ( R d ) \mathcal{F}_{\mu,L}(\mathbf{R}^{d}) F μ , L ( R d ) -interpolable.
(b) { ( x undefined i , g undefined i , f undefined i ) } i ∈ I ≔ { ( x i , L x i − g i , L 2 ∥ x i ∥ 2 − f i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\coloneqq\{(x_{i},Lx_{i}-g_{i},\frac{L}{2}\|x_{i}\|^{2}-f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I : = {( x i , L x i − g i , 2 L ∥ x i ∥ 2 − f i ) } i ∈ I is F 0 , L − μ ( R d ) \mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) F 0 , L − μ ( R d ) -interpolable.
Proof.
Click to show proof
First, we prove ( a ) ⇒ ( b ) . (a)\Rightarrow(b). ( a ) ⇒ ( b ) . If ( a ) (a) ( a ) holds, then by definition there exists a function f ∈ F μ , L ( R d ) f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}) f ∈ F μ , L ( R d ) that satisfies for all i ∈ I i\in I i ∈ I : f ( x i ) = f i f(x_{i})=f_{i} f ( x i ) = f i and ∇ f ( x i ) = g i \nabla f(x_{i})=g_{i} ∇ f ( x i ) = g i . If we define, h = ( L / 2 ) ∥ ⋅ ∥ 2 − f , h=(L/2)\|\cdot\|^{2}-f, h = ( L /2 ) ∥ ⋅ ∥ 2 − f , then h ∈ F 0 , L − μ ( R d ) h\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) h ∈ F 0 , L − μ ( R d ) due to Lemma 1 , and for all i ∈ I i\in I i ∈ I we have h ( x i ) = L 2 ∥ x i ∥ 2 − f ( x i ) = L 2 ∥ x i ∥ 2 − f i h(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-f(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-f_{i} h ( x i ) = 2 L ∥ x i ∥ 2 − f ( x i ) = 2 L ∥ x i ∥ 2 − f i and ∇ h ( x i ) = L x i − ∇ f ( x i ) = L x i − g i \nabla h(x_{i})=Lx_{i}-\nabla f(x_{i})=Lx_{i}-g_{i} ∇ h ( x i ) = L x i − ∇ f ( x i ) = L x i − g i . This proves ( b ) (b) ( b ) .
Next, we prove ( b ) ⇒ ( a ) (b)\Rightarrow(a) ( b ) ⇒ ( a ) . If ( b ) (b) ( b ) holds, there exists a function h ∈ F 0 , L − μ ( R d ) h\in\mathcal{F}_{0,L-\mu }(\mathbf{R}^{d}) h ∈ F 0 , L − μ ( R d ) that satisfies for all i ∈ I i\in I i ∈ I : h ( x i ) = L 2 ∥ x i ∥ 2 − f i h(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-f_{i} h ( x i ) = 2 L ∥ x i ∥ 2 − f i and ∇ h ( x i ) = L x i − g i \nabla h(x_{i})=Lx_{i}-g_{i} ∇ h ( x i ) = L x i − g i . If we define, f = ( L / 2 ) ∥ ⋅ ∥ 2 − h f=(L/2)\|\cdot\|^{2}-h f = ( L /2 ) ∥ ⋅ ∥ 2 − h , then f ∈ F μ , L ( R d ) f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}) f ∈ F μ , L ( R d ) due to Lemma 1 , and for all i ∈ I i\in I i ∈ I we have f ( x i ) = L 2 ∥ x i ∥ 2 − h ( x i ) = L 2 ∥ x i ∥ 2 − ( L 2 ∥ x i ∥ 2 − f i ) = f i f(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-h(x_{i})=\frac{L}{2}\|x_{i}\|^{2}-(\frac{L}{2}\|x_{i}\|^{2}-f_{i})=f_{i} f ( x i ) = 2 L ∥ x i ∥ 2 − h ( x i ) = 2 L ∥ x i ∥ 2 − ( 2 L ∥ x i ∥ 2 − f i ) = f i and ∇ f ( x i ) = L x i − ∇ h ( x i ) = L x i − ( L x i − g i ) = g i . \nabla f(x_{i})=Lx_{i}-\nabla h(x_{i})=Lx_{i}-(Lx_{i}-g_{i})=g_{i}. ∇ f ( x i ) = L x i − ∇ h ( x i ) = L x i − ( L x i − g i ) = g i . This proves (a).
■
Finally, we present the following interpolation result due to Yoel Drori from[3]
. I showed a detailed proof of this result in the previous blog post: here .
If
{ ( x undefined i , g undefined i , f undefined i ) } i ∈ I ⊆ R d × R d × R \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R is
F 0 , L ( R d ) \mathcal{F}_{0,L}(\mathbf{R}^{d}) F 0 , L ( R d ) -interpolable with
L > 0 L>0 L > 0 , then one interpolated function
f undefined ∈ F 0 , L ( R d ) \widetilde{f}\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) that interpolates
{ ( x undefined i , g undefined i , f undefined i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is:
f undefined ( y ) = min α ∈ Δ [ L 2 ∥ y − ∑ i ∈ I α i ( x undefined i − 1 L g undefined i ) ∥ 2 + ∑ i ∈ I α i ( f undefined i − 1 2 L ∥ g undefined i ∥ 2 ) ] , \widetilde{f}(y)=\min_{\alpha\in\Delta}\left[\frac{L}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2L}\|\widetilde{g}_{i}\|^{2}\right)\right], f ( y ) = α ∈ Δ min [ 2 L ∥ y − i ∈ I ∑ α i ( x i − L 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 L 1 ∥ g i ∥ 2 ) ] , where Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } . \Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}. Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } .
Now we are in a position to state our main result. The results appeared first in the paper by Yoel Drori and Adrien Taylor from[4, Theorem 1]
. The proof we present here is based on interpolation argument.
If { ( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R \{(x_{i},g_{i},f_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R is F μ , L ( R d ) \mathcal{F}_{\mu,L}(\mathbf{R}^{d}) F μ , L ( R d ) -interpolable with L > 0 L>0 L > 0 , then one interpolation function f ∈ F μ , L ( R d ) f\in\mathcal{F}_{\mu,L}(\mathbf{R}^{d}) f ∈ F μ , L ( R d ) that interpolates { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is:
f ( y ) : = max α ∈ Δ [ L 2 ∥ y ∥ 2 − L − μ 2 ∥ y − 1 L − μ ∑ i α i ( g i − μ x i ) ∥ 2 + ∑ i ∈ I α i ( f i + 1 2 ( L − μ ) ∥ g i − L x i ∥ 2 − L 2 ∥ x i ∥ 2 ) ] , \begin{align*} f(y) & :=\max_{\alpha\in\Delta}\Big[\frac{L}{2}\|y\|^{2}-\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}\\ & \quad +\sum_{i\in I}\alpha_{i}(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2})\Big], \end{align*} f ( y ) := α ∈ Δ max [ 2 L ∥ y ∥ 2 − 2 L − μ ∥ y − L − μ 1 i ∑ α i ( g i − μ x i ) ∥ 2 + i ∈ I ∑ α i ( f i + 2 ( L − μ ) 1 ∥ g i − L x i ∥ 2 − 2 L ∥ x i ∥ 2 ) ] , where Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } . \Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}. Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } .
Proof. The proof sketch is as follows comprising two steps.
First, we find an interpolated function f undefined ∈ F 0 , L − μ ( R d ) \widetilde{f}\in\mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) f ∈ F 0 , L − μ ( R d ) that interpolates { ( x undefined i , g undefined i , f undefined i ) } i ∈ I ≔ { ( x i , L x i − g i , L 2 ∥ x i ∥ 2 − f i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\coloneqq\{(x_{i},Lx_{i}-g_{i},\frac{L}{2}\|x_{i}\|^{2}-f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I : = {( x i , L x i − g i , 2 L ∥ x i ∥ 2 − f i ) } i ∈ I using Theorem 3 .
Then due to the proof of Theorem 2 , the function f = ( L / 2 ) ∥ ⋅ ∥ 2 − f undefined f=(L/2)\|\cdot\|^{2}-\widetilde{f} f = ( L /2 ) ∥ ⋅ ∥ 2 − f will be in F μ , L ( R d ) \mathcal{F}_{\mu,L}(\mathbf{R}^{d}) F μ , L ( R d ) and will interpolate { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I .
(1) From Theorem 3 recall that, if { ( x undefined i , g undefined i , f undefined i ) } i ∈ I ⊆ R d × R d × R \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\subseteq\mathbf{R}^{d}\times\mathbf{R}^{d}\times\mathbf{R} {( x i , g i , f i ) } i ∈ I ⊆ R d × R d × R is F 0 , L − μ ( R d ) \mathcal{F}_{0,L-\mu}(\mathbf{R}^{d}) F 0 , L − μ ( R d ) -interpolable, then one interpolation function f undefined ∈ F 0 , L ( R d ) \widetilde{f}\in\mathcal{F}_{0,L}(\mathbf{R}^{d}) f ∈ F 0 , L ( R d ) that interpolates { ( x undefined i , g undefined i , f undefined i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I is:
f undefined ( y ) = min α ∈ Δ [ L − μ 2 ∥ y − ∑ i ∈ I α i ( x undefined i − 1 L − μ g undefined i ) ∥ 2 + ∑ i ∈ I α i ( f undefined i − 1 2 ( L − μ ) ∥ g undefined i ∥ 2 ) ] , \widetilde{f}(y)=\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\sum_{i\in I}\alpha_{i}\left(\widetilde{x}_{i}-\frac{1}{L-\mu}\widetilde{g}_{i}\right)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\widetilde{f}_{i}-\frac{1}{2(L-\mu)}\|\widetilde{g}_{i}\|^{2}\right)\right], f ( y ) = α ∈ Δ min [ 2 L − μ ∥ y − i ∈ I ∑ α i ( x i − L − μ 1 g i ) ∥ 2 + i ∈ I ∑ α i ( f i − 2 ( L − μ ) 1 ∥ g i ∥ 2 ) ] , where Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } . \Delta=\{\beta\in\mathbf{R}^{\vert I\vert}\mid\beta\geq0,\sum_{i\in I}\beta_{i}=1\}. Δ = { β ∈ R ∣ I ∣ ∣ β ≥ 0 , ∑ i ∈ I β i = 1 } . In our setup, we have { ( x undefined i , g undefined i , f undefined i ) } i ∈ I ≔ { ( x i , L x i − g i , L 2 ∥ x i ∥ 2 − f i ) } i ∈ I \{(\widetilde{x}_{i},\widetilde{g}_{i},\widetilde{f}_{i})\}_{i\in I}\coloneqq\{(x_{i},Lx_{i}-g_{i},\frac{L}{2}\|x_{i}\|^{2}-f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I : = {( x i , L x i − g i , 2 L ∥ x i ∥ 2 − f i ) } i ∈ I , so lets put that in the last equation:
f undefined ( y ) = min α ∈ Δ [ L − μ 2 ∥ y − ∑ i ∈ I α i ( x i − 1 L − μ ( L x i − g i ) undefined = x i ( 1 − L L − μ ) + 1 L − μ g i = − μ L − μ x i + 1 L − μ g i = 1 L − μ ( g i − μ x i ) ) ∥ 2 + ∑ i ∈ I α i ( L 2 ∥ x i ∥ 2 − f i − 1 2 ( L − μ ) ∥ L x i − g i ∥ 2 ) ] = min α ∈ Δ [ L − μ 2 ∥ y − 1 L − μ ∑ i ∈ I α i ( g i − μ x i ) ∥ 2 − ∑ i ∈ I α i ( f i + 1 2 ( L − μ ) ∥ g i − L x i ∥ 2 − L 2 ∥ x i ∥ 2 ) ] . \begin{aligned} \widetilde{f}(y) & =\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\sum_{i\in I}\alpha_{i}\Big(\underbrace{x_{i}-\frac{1}{L-\mu}(Lx_{i}-g_{i})}_{=x_{i}(1-\frac{L}{L-\mu})+\frac{1}{L-\mu}g_{i}=-\frac{\mu}{L-\mu}x_{i}+\frac{1}{L-\mu}g_{i}=\frac{1}{L-\mu}(g_{i}-\mu x_{i})}\Big)\|^{2}+\sum_{i\in I}\alpha_{i}\left(\frac{L}{2}\|x_{i}\|^{2}-f_{i}-\frac{1}{2(L-\mu)}\|Lx_{i}-g_{i}\|^{2}\right)\right]\\ & =\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}-\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right].\end{aligned} f ( y ) = α ∈ Δ min ⎣ ⎡ 2 L − μ ∥ y − i ∈ I ∑ α i ( = x i ( 1 − L − μ L ) + L − μ 1 g i = − L − μ μ x i + L − μ 1 g i = L − μ 1 ( g i − μ x i ) x i − L − μ 1 ( L x i − g i ) ) ∥ 2 + i ∈ I ∑ α i ( 2 L ∥ x i ∥ 2 − f i − 2 ( L − μ ) 1 ∥ L x i − g i ∥ 2 ) ⎦ ⎤ = α ∈ Δ min [ 2 L − μ ∥ y − L − μ 1 i ∈ I ∑ α i ( g i − μ x i ) ∥ 2 − i ∈ I ∑ α i ( f i + 2 ( L − μ ) 1 ∥ g i − L x i ∥ 2 − 2 L ∥ x i ∥ 2 ) ] . (2) Hence, f = ( L / 2 ) ∥ ⋅ ∥ 2 − f undefined f=(L/2)\|\cdot\|^{2}-\widetilde{f} f = ( L /2 ) ∥ ⋅ ∥ 2 − f , which will be in F μ , L ( R d ) \mathcal{F}_{\mu,L}(\mathbf{R}^{d}) F μ , L ( R d ) and will interpolate { ( x i , g i , f i ) } i ∈ I \{(x_{i},g_{i},f_{i})\}_{i\in I} {( x i , g i , f i ) } i ∈ I has the following form:
f ( y ) = ( L / 2 ) ∥ y ∥ 2 − f undefined ( y ) = ( L / 2 ) ∥ y ∥ 2 − min α ∈ Δ [ L − μ 2 ∥ y − 1 L − μ ∑ i ∈ I α i ( g i − μ x i ) ∥ 2 − ∑ i ∈ I α i ( f i + 1 2 ( L − μ ) ∥ g i − L x i ∥ 2 − L 2 ∥ x i ∥ 2 ) ] = a ) ( L / 2 ) ∥ y ∥ 2 + max α ∈ Δ [ − L − μ 2 ∥ y − 1 L − μ ∑ i ∈ I α i ( g i − μ x i ) ∥ 2 + ∑ i ∈ I α i ( f i + 1 2 ( L − μ ) ∥ g i − L x i ∥ 2 − L 2 ∥ x i ∥ 2 ) ] = max α ∈ Δ [ ( L / 2 ) ∥ y ∥ 2 − L − μ 2 ∥ y − 1 L − μ ∑ i ∈ I α i ( g i − μ x i ) ∥ 2 + ∑ i ∈ I α i ( f i + 1 2 ( L − μ ) ∥ g i − L x i ∥ 2 − L 2 ∥ x i ∥ 2 ) ] , \begin{aligned} f(y) & =(L/2)\|y\|^{2}-\widetilde{f}(y)\\ & =(L/2)\|y\|^{2}-\min_{\alpha\in\Delta}\left[\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}-\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right]\\ & \overset{a)}{=}(L/2)\|y\|^{2}+\max_{\alpha\in\Delta}\left[-\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right]\\ & =\max_{\alpha\in\Delta}\left[(L/2)\|y\|^{2}-\frac{L-\mu}{2}\|y-\frac{1}{L-\mu}\sum_{i\in I}\alpha_{i}(g_{i}-\mu x_{i})\|^{2}+\sum_{i\in I}\alpha_{i}\left(f_{i}+\frac{1}{2(L-\mu)}\|g_{i}-Lx_{i}\|^{2}-\frac{L}{2}\|x_{i}\|^{2}\right)\right],\end{aligned} f ( y ) = ( L /2 ) ∥ y ∥ 2 − f ( y ) = ( L /2 ) ∥ y ∥ 2 − α ∈ Δ min [ 2 L − μ ∥ y − L − μ 1 i ∈ I ∑ α i ( g i − μ x i ) ∥ 2 − i ∈ I ∑ α i ( f i + 2 ( L − μ ) 1 ∥ g i − L x i ∥ 2 − 2 L ∥ x i ∥ 2 ) ] = a ) ( L /2 ) ∥ y ∥ 2 + α ∈ Δ max [ − 2 L − μ ∥ y − L − μ 1 i ∈ I ∑ α i ( g i − μ x i ) ∥ 2 + i ∈ I ∑ α i ( f i + 2 ( L − μ ) 1 ∥ g i − L x i ∥ 2 − 2 L ∥ x i ∥ 2 ) ] = α ∈ Δ max [ ( L /2 ) ∥ y ∥ 2 − 2 L − μ ∥ y − L − μ 1 i ∈ I ∑ α i ( g i − μ x i ) ∥ 2 + i ∈ I ∑ α i ( f i + 2 ( L − μ ) 1 ∥ g i − L x i ∥ 2 − 2 L ∥ x i ∥ 2 ) ] , where a ) a) a ) uses min ( ⋅ ) = − max ( − ⋅ ) \min(\cdot)=-\max(-\cdot) min ( ⋅ ) = − max ( − ⋅ ) . This completes the proof. ■
[1]
Taylor, Adrien B. Convex interpolation and performance estimation of first-order methods for convex optimization. Diss. Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2017.
[2]
R Tyrell Rockafellar. Convex Analysis. Princeton University Press, 1996.
[3]
Drori, Yoel. The exact information-based complexity of smooth convex minimization. Journal of Complexity 39 (2017): 1-16.
[4]
Drori, Yoel and Taylor, Adrien, 2021. On the oracle complexity of smooth strongly convex minimization. arXiv preprint arXiv:2101.09740.