We consider nonempty, closed set X \mathcal{X} X in a finite-dimensional vector space, which is prox-regular at a point x ˉ ∈ X \bar{x}\in\mathcal{X} x ˉ ∈ X , i.e. , Euclidean projection onto the set from x ˉ \bar{x} x ˉ is single-valued in some neighborhood of that point. Throughout the blog, the underlying space is a finite-dimensional vector space over the reals. To state the result and to prove it, we introduce certain notation and notions.
An open ball with center c c c and radius r r r is denoted by
B ( c ; r ) = { x ∣ ∥ x − c ∥ < r } , B(c;r)=\left\{ x\mid\|x-c\| < r\right\}, B ( c ; r ) = { x ∣ ∥ x − c ∥ < r } , where the norm is is the 2 norm throughout the blog.
The indicator function of X \mathcal{X} X is denoted by ι , \iota, ι , with
ι ( x ) = { 0 , x ∈ X ∞ , x ∉ X . \iota(x)=\begin{cases} 0, & x\in\mathcal{X}\\ \infty, & x\notin\mathcal{X}. \end{cases} ι ( x ) = { 0 , ∞ , x ∈ X x ∈ / X . A Euclidean projection from a point x x x onto this set X \mathcal{X} X is denoted by Π ( x ) \Pi(x) Π ( x ) , and one arbitrary projection in Π ( x ) \Pi(x) Π ( x ) is denoted by π x \pi_{x} π x . It is defined as:
π x ∈ Π ( x ) = argmin y ∈ X ∥ x − y ∥ 2 . ( Projx ) \pi_{x}\in\Pi(x)=\mathop{\textrm{argmin}_{y\in\mathcal{X}}\|x-y\|^{2}.}\quad(\textrm{Projx}) π x ∈ Π ( x ) = argmin y ∈ X ∥ x − y ∥ 2 . ( Projx ) The distance function at the point x x x is denoted by d ( x ) = ∥ x − π x ∥ d(x)=\|x-\pi_{x}\| d ( x ) = ∥ x − π x ∥ , where the norm is the 2 norm.
A function is locally Lipschitz, if at any point, there is a neighborhood around that point where the function is Lipschitz continuous.
For any closed, nonempty set its distance function is 1-Lipschitz continuous everywhere (Rockafellar and Wets (2009) ) [Example 9.6]. This implies that for any β \beta β , d 2 + β ∥ x ∥ 2 d^{2}+\beta\|x\|^{2} d 2 + β ∥ x ∥ 2 is locally Lipschitz on any point x x x . To see that, consider x ∈ B x\in\mathcal{B} x ∈ B where B \mathcal{B} B is some closed ball around x x x . Then, for any x , y ∈ B , x,y\in\mathcal{B}, x , y ∈ B ,
∣ d 2 ( x ) + β ∥ x ∥ 2 − d 2 ( y ) − β ∥ y ∥ 2 ∣ ≤ ∣ ( d ( x ) − d ( y ) ) ( d ( x ) + d ( y ) ) ∣ + ∣ β ∣ ∣ ∥ x ∥ 2 − ∥ y ∥ 2 undefined = ( ∥ x ∥ + ∥ y ∥ ) ( ∥ x ∥ − ∥ y ∥ ) ∣ ≤ ∣ d ( x ) + d ( y ) ∣ undefined ≤ M 1 ∣ d ( x ) − d ( y ) ∣ undefined ≤ ∥ x − y ∥ + ∣ β ∣ ∣ ∥ x ∥ + ∥ y ∥ ∣ undefined ≤ M 2 ∣ ∥ x ∥ − ∥ y ∥ ∣ undefined ≤ ∥ x − y ∥ ≤ ( M 1 + M 2 β ) ∥ x − y ∥ , ( DistSqdLcLip ) \begin{align*} & |d^{2}(x)+\beta\|x\|^{2}-d^{2}(y)-\beta\|y\|^{2}|\\ \leq & |\left(d(x)-d(y)\right)\left(d(x)+d(y)\right)|+|\beta||\underbrace{\|x\|^{2}-\|y\|^{2}}_{=(\|x\|+\|y\|)(\|x\|-\|y\|)}|\\ \leq & \underbrace{|d(x)+d(y)|}_{\leq M_{1}}\underbrace{|d(x)-d(y)|}_{\leq\|x-y\|}\\ & +|\beta|\underbrace{|\|x\|+\|y\||}_{\leq M_{2}}\underbrace{|\|x\|-\|y\||}_{\leq\|x-y\|}\\ \leq & (M_{1}+M_{2}\beta)\|x-y\|,\quad(\textrm{DistSqdLcLip}) \end{align*} ≤ ≤ ≤ ∣ d 2 ( x ) + β ∥ x ∥ 2 − d 2 ( y ) − β ∥ y ∥ 2 ∣ ∣ ( d ( x ) − d ( y ) ) ( d ( x ) + d ( y ) ) ∣ + ∣ β ∣∣ = ( ∥ x ∥ + ∥ y ∥ ) ( ∥ x ∥ − ∥ y ∥ ) ∥ x ∥ 2 − ∥ y ∥ 2 ∣ ≤ M 1 ∣ d ( x ) + d ( y ) ∣ ≤ ∥ x − y ∥ ∣ d ( x ) − d ( y ) ∣ + ∣ β ∣ ≤ M 2 ∣∥ x ∥ + ∥ y ∥∣ ≤ ∥ x − y ∥ ∣∥ x ∥ − ∥ y ∥∣ ( M 1 + M 2 β ) ∥ x − y ∥ , ( DistSqdLcLip ) where in the second line we have used (i) the existence of constant M 1 , M 2 > 0 M_{1},M_{2}>0 M 1 , M 2 > 0 due to the points x , y ∈ B x,y\in\mathcal{B} x , y ∈ B , and (ii) the reverse triangle inequality.
The proximal normal cone of X \mathcal{X} X at a point x ∈ X x\in\mathcal{X} x ∈ X is defined as follows (Rockafellar and Wets (2009) )
v ∈ N ( x ) ⇔ ∃ τ > 0 x ∈ Π ( x + τ v ) , ( PrxNrmlCn ) v\in N(x)\Leftrightarrow\exists_{\tau>0}\;x\in\Pi(x+\tau v),\quad(\textrm{PrxNrmlCn}) v ∈ N ( x ) ⇔ ∃ τ > 0 x ∈ Π ( x + τv ) , ( PrxNrmlCn ) which also implies that
∀ τ undefined ∈ ( 0 , τ ) Π ( x + τ undefined v ) = { x } . \forall_{\widetilde{\tau}\in(0,\tau)}\;\Pi(x+\widetilde{\tau}v)=\{x\}. ∀ τ ∈ ( 0 , τ ) Π ( x + τ v ) = { x } . For any function f f f (not necessarily convex), its Frechet subdifferential ∂ f \partial f ∂ f at a point x x x is defined as follows (Correa et al. (1992) )
v ∈ ∂ f ( x ) ⇔ lim inf y → 0 f ( x + y ) − f ( x ) − ⟨ v ∣ y ⟩ ∥ y ∥ ≥ 0. v\in\partial f(x)\Leftrightarrow\liminf_{y\to0}\frac{f(x+y)-f(x)-\left\langle v\mid y\right\rangle }{\|y\|}\geq0. v ∈ ∂ f ( x ) ⇔ y → 0 lim inf ∥ y ∥ f ( x + y ) − f ( x ) − ⟨ v ∣ y ⟩ ≥ 0.
On the other hand, The Clarke subdifferential of a locally Lipschitz function f f f is defined as follows:
u ∈ ∂ Clarke f ( x ) ⇔ ∀ d [ lim sup y → x , t ↓ 0 f ( y + t d ) − f ( y ) t ] ≥ ⟨ u ∣ d ⟩ . u\in\partial^{\textrm{Clarke}}f(x)\Leftrightarrow\forall_{d}\;\left[\limsup_{y\to x,t\downarrow0}\frac{f(y+td)-f(y)}{t}\right]\geq\left\langle u\mid d\right\rangle . u ∈ ∂ Clarke f ( x ) ⇔ ∀ d [ y → x , t ↓ 0 lim sup t f ( y + t d ) − f ( y ) ] ≥ ⟨ u ∣ d ⟩ . For a locally Lipschitz function the Clarke subdifferential is nonempty everywhere (Correa et al. (1992) ) [Property 2.2].
Define the infimal convolution function between two lower-semicontinuous functions f f f and g g g as follows (Correa et al. (1992) ) [Lemma 3.6]:
( f □ g ) ( x ) = inf y { f ( y ) + g ( x − y ) } . \begin{aligned} (f\square g)(x) & =\inf_{y}\left\{ f(y)+g(x-y)\right\} .\end{aligned} ( f □ g ) ( x ) = y inf { f ( y ) + g ( x − y ) } . Also, define
y x ⋆ ∈ argmin y { f ( y ) + g ( x − y ) } , y_{x}^{\star}\in\mathop{\textrm{argmin}}_{y}\left\{ f(y)+g(x-y)\right\}, y x ⋆ ∈ argmin y { f ( y ) + g ( x − y ) } , which can be empty. If y x ⋆ y_{x}^{\star} y x ⋆ exists, then
∂ ( f □ g ) ( x ) ⊆ ∂ f ( y x ⋆ ) ∩ ∂ g ( x − y x ⋆ ) . ( FrechSubRule ) \partial(f\square g)(x)\subseteq\partial f(y_{x}^{\star})\cap\partial g(x-y_{x}^{\star}).\quad(\textrm{FrechSubRule}) ∂ ( f □ g ) ( x ) ⊆ ∂ f ( y x ⋆ ) ∩ ∂ g ( x − y x ⋆ ) . ( FrechSubRule ) Note that,
d 2 ( x ) = min y ∈ X ∥ x − y ∥ 2 = min y { ι ( y ) + ∥ x − y ∥ 2 } = ( ι □ ∥ ⋅ ∥ 2 ) ( x ) , d^{2}(x)=\min_{y\in\mathcal{X}}\|x-y\|^{2}=\min_{y}\left\{ \iota(y)+\|x-y\|^{2}\right\} =(\iota\square\|\cdot\|^{2})(x), d 2 ( x ) = y ∈ X min ∥ x − y ∥ 2 = y min { ι ( y ) + ∥ x − y ∥ 2 } = ( ι □ ∥ ⋅ ∥ 2 ) ( x ) , where by definition:
π x = argmin y ∈ X ∥ x − y ∥ 2 = argmin y { ι ( y ) □ ∥ x − y ∥ 2 } , \pi_{x}=\mathop{\textrm{argmin}}_{y\in\mathcal{X}}\|x-y\|^{2}=\mathop{\textrm{argmin}}_{y}\left\{ \iota(y)\square\|x-y\|^{2}\right\}, π x = argmin y ∈ X ∥ x − y ∥ 2 = argmin y { ι ( y ) □ ∥ x − y ∥ 2 } , Hence, recalling that ∂ ∥ x ∥ 2 = 2 x \partial\|x\|^{2}=2x ∂ ∥ x ∥ 2 = 2 x , we can find a Frechet subgradient of d 2 d^{2} d 2 using (FrechSubRule):
∂ d 2 ( x ) = ∂ ( ι □ ∥ ⋅ ∥ 2 ) ( x ) ⊆ ∂ ι ( π x ) ∩ ∂ [ ∥ x − π x ∥ 2 ] undefined = 2 ( x − π x ) = ∂ ι ( π x ) ∩ { 2 ( x − π x ) } , \begin{aligned} \partial d^{2}(x) & =\partial(\iota\square\|\cdot\|^{2})(x)\\ \subseteq & \partial\iota(\pi_{x})\cap\underbrace{\partial\left[\|x-\pi_{x}\|^{2}\right]}_{=2(x-\pi_{x})}\\ & =\partial\iota(\pi_{x})\cap\{2(x-\pi_{x})\},\end{aligned} ∂ d 2 ( x ) ⊆ = ∂ ( ι □ ∥ ⋅ ∥ 2 ) ( x ) ∂ ι ( π x ) ∩ = 2 ( x − π x ) ∂ [ ∥ x − π x ∥ 2 ] = ∂ ι ( π x ) ∩ { 2 ( x − π x )} , hence if ∂ d 2 ( x ) ≠ ∅ \partial d^{2}(x)\neq\emptyset ∂ d 2 ( x ) = ∅ , then ∂ d 2 ( x ) = { 2 ( x − π x ) } \partial d^{2}(x)=\{2(x-\pi_{x})\} ∂ d 2 ( x ) = { 2 ( x − π x )} . In other words,
∀ x : ∂ d 2 ( x ) ≠ ∅ ∂ d 2 ( x ) = 2 ( x − π x ) . ( FrechSubDistSqd ) \forall_{x:\partial d^{2}(x)\neq\emptyset}\quad\partial d^{2}(x)=2(x-\pi_{x}).\quad(\textrm{FrechSubDistSqd}) ∀ x : ∂ d 2 ( x ) = ∅ ∂ d 2 ( x ) = 2 ( x − π x ) . ( FrechSubDistSqd ) If a locally Lipschitz function f f f has its Frechet subdifferential ∂ f \partial f ∂ f monotone on { ( x , u ) ∈ g r a ∂ f ∣ x ∈ A } \{(x,u) \in \mathbf{gra}\partial f \mid x\in A\} {( x , u ) ∈ gra ∂ f ∣ x ∈ A } , where A A A is a convex set, then f f f is convex on A A A (Correa et al. (1992) ) [Theorem 3.8, Remark after Property 2.7]. In other words, for a locally Lipschitz function f f f
∀ ( x , u ) , ( y , v ) ∈ g r a ∂ f : x , y ∈ A ⟨ u − v ∣ x − y ⟩ ≥ 0 ⇒ f : convex on A ( LocCvx ) , \forall_{(x,u),(y,v)\in\mathbf{gra}\partial f:x,y\in A}\;\left\langle u-v\mid x-y\right\rangle \geq0\Rightarrow f:\textrm{convex on }A\quad(\textrm{LocCvx}), ∀ ( x , u ) , ( y , v ) ∈ gra ∂ f : x , y ∈ A ⟨ u − v ∣ x − y ⟩ ≥ 0 ⇒ f : convex on A ( LocCvx ) , where g r a ∂ f = { ( x , u ) ∣ u ∈ ∂ f ( x ) } . \mathbf{gra}\partial f=\left\{ (x,u)\mid u\in\partial f(x)\right\}. gra ∂ f = { ( x , u ) ∣ u ∈ ∂ f ( x ) } .
Furthermore, if f f f is locally Lipschitz, then monotonicity of ∂ f \partial f ∂ f is equivalent to the monotonicity of ∂ Clarke f \partial^{\textrm{Clarke}}f ∂ Clarke f , so proving either is fine to establish convexity (Correa et al. (1992) ) [Remark after Property 2.7].
An operator A A A is β \beta β -cocoercive on some convex set S S S if
∀ ( x , u ) , ( y , v ) ∈ S ∩ g r a A ⟨ x − y ∣ u − v ⟩ ≥ β ∥ u − v ∥ 2 . ( CocrcvOpt ) \forall_{(x,u),(y,v)\in S\cap\mathbf{gra}A}\quad\left\langle x-y\mid u-v\right\rangle \geq\beta\|u-v\|^{2}.\quad(\textrm{CocrcvOpt}) ∀ ( x , u ) , ( y , v ) ∈ S ∩ gra A ⟨ x − y ∣ u − v ⟩ ≥ β ∥ u − v ∥ 2 . ( CocrcvOpt ) A β \beta β -cocoercive operator on S S S is also 1 β \frac{1}{\beta} β 1 -Lipschitz on S S S .
An operator A A A is ρ \rho ρ -hypomonotone on a convex set S S S if
∀ ( x , u ) , ( u , v ) ∈ S ∩ g r a A ⟨ u − v ∣ x − y ⟩ + ρ ∥ x − y ∥ 2 ≥ 0. \forall_{(x,u),(u,v)\in S\cap\mathbf{gra}A}\quad\left\langle u-v\mid x-y\right\rangle +\rho\|x-y\|^{2}\geq0. ∀ ( x , u ) , ( u , v ) ∈ S ∩ gra A ⟨ u − v ∣ x − y ⟩ + ρ ∥ x − y ∥ 2 ≥ 0. The following result is a restatement of Equation (3.1) of (Poliquin et al. (2000) ) . If X \mathcal{X} X is prox-regular at x ˉ ∈ X \bar{x}\in\mathcal{X} x ˉ ∈ X , then there exists some ρ > 0 \rho>0 ρ > 0 and R > 0 R>0 R > 0 such that the proximal normal cone is a ρ \rho ρ -hypomonotone operator on an R R R -neighborhood of ( x ˉ , 0 ) (\bar{x},0) ( x ˉ , 0 ) , defined by
V R ( x ˉ , 0 ) = { ( p , u ) ∣ ∥ p − x ˉ ∥ ≤ R , ∥ u − 0 ∥ ≤ R } , (VR) \mathcal{V}_{R}(\bar{x},0)=\left\{ (p,u)\mid\|p-\bar{x}\|\leq R,\|u-0\|\leq R\right\},\quad \textrm{(VR)} V R ( x ˉ , 0 ) = { ( p , u ) ∣ ∥ p − x ˉ ∥ ≤ R , ∥ u − 0∥ ≤ R } , (VR) i.e. ,
∀ ( p , u ) , ( q , v ) ∈ V R ( x ˉ , 0 ) ∩ g r a N ⟨ u − v ∣ p − q ⟩ + ρ ∥ p − q ∥ 2 ≥ 0. ( HypMntn ) \begin{aligned} \forall_{(p,u),(q,v)\in\mathcal{V}_{R}(\bar{x},0)\cap\mathbf{gra}N} & \quad\left\langle u-v\mid p-q\right\rangle +\rho\|p-q\|^{2}\geq 0. \quad(\textrm{HypMntn})\end{aligned} ∀ ( p , u ) , ( q , v ) ∈ V R ( x ˉ , 0 ) ∩ gra N ⟨ u − v ∣ p − q ⟩ + ρ ∥ p − q ∥ 2 ≥ 0. ( HypMntn ) Now we are in a position to state the main result regarding geometry of a prox-regular set and prove it.
If X \mathcal{X} X is set that is prox-regular at x ˉ ∈ X \bar{x}\in\mathcal{X} x ˉ ∈ X , then there exist some ρ > 0 , R > 0 \rho>0,R>0 ρ > 0 , R > 0 such that for any λ \lambda λ satisfying λ ∈ ( 0 , 2 ) \lambda\in(0,2) λ ∈ ( 0 , 2 ) and λ ≤ ρ \lambda\leq\rho λ ≤ ρ we have the following properties :
(i) for any x ∈ B ( x ˉ ; λ R 2 ρ ) x\in B(\bar{x};\frac{\lambda R}{2\rho}) x ∈ B ( x ˉ ; 2 ρ λ R ) , we have ( π x , 2 ρ λ ( x − π x ) ) ∈ V R ( x ˉ , 0 ) ∩ g r a N \left(\pi_{x},\frac{2\rho}{\lambda}(x-\pi_{x})\right)\in\mathcal{V}_{R}(\bar{x},0)\cap\mathbf{gra}N ( π x , λ 2 ρ ( x − π x ) ) ∈ V R ( x ˉ , 0 ) ∩ gra N , where V R ( x ˉ , 0 ) \mathcal{V}_R(\bar x,0) V R ( x ˉ , 0 ) is the same set defined in (VR).
(ii) the projection operator Π \Pi Π is 2 2 − λ \frac{2}{2-\lambda} 2 − λ 2 -Lipschitz continuous on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) ,
(iii) on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) , the function ϕ λ = d 2 + λ 2 − λ ∥ ⋅ ∥ 2 \phi_{\lambda}=d^{2}+\frac{\lambda}{2-\lambda}\|\cdot\|^{2} ϕ λ = d 2 + 2 − λ λ ∥ ⋅ ∥ 2 is convex and differentiable, with the derivative given by ∇ ϕ λ ( x ) = 2 ( x − π x ) + 2 2 2 − λ x . \nabla\phi_{\lambda}(x)=2(x-\pi_{x})+2\frac{2}{2-\lambda}x. ∇ ϕ λ ( x ) = 2 ( x − π x ) + 2 2 − λ 2 x .
First, we show that for for any x ∈ B ( x ˉ , λ R 2 ρ ) x\in B\left(\bar{x},\frac{\lambda R}{2\rho}\right) x ∈ B ( x ˉ , 2 ρ λ R ) , ∥ π x − x ˉ ∥ ≤ R \|\pi_{x}-\bar{x}\|\leq R ∥ π x − x ˉ ∥ ≤ R . For x ∈ B ( x ˉ , λ R 2 ρ ) , x\in B\left(\bar{x},\frac{\lambda R}{2\rho}\right), x ∈ B ( x ˉ , 2 ρ λ R ) , we have
∥ π x − x ˉ ∥ = ∥ π x − x + x − x ˉ ∥ ≤ a ) ∥ π x − x ∥ undefined = d ( x ) ≤ ∥ x ˉ − x ∥ < ( λ R / 2 ρ ) + ∥ x − x ˉ ∥ undefined < ( λ R / 2 ρ ) < λ R ρ ≤ R , \begin{aligned} & \|\pi_{x}-\bar{x}\|\\ = & \|\pi_{x}-x+x-\bar{x}\|\\ \overset{a)}{\leq} & \underbrace{\|\pi_{x}-x\|}_{=d(x)\leq\|\bar{x}-x\|<(\lambda R/2\rho)}+\underbrace{\|x-\bar{x}\|}_{<(\lambda R/2\rho)}\\ < & \frac{\lambda R}{\rho}\\ \leq & R,\end{aligned} = ≤ a ) < ≤ ∥ π x − x ˉ ∥ ∥ π x − x + x − x ˉ ∥ = d ( x ) ≤ ∥ x ˉ − x ∥ < ( λ R /2 ρ ) ∥ π x − x ∥ + < ( λ R /2 ρ ) ∥ x − x ˉ ∥ ρ λ R R , where in the last line we have used λ ≤ ρ \lambda\leq\rho λ ≤ ρ .
Set τ ≔ λ 2 ρ > 0 \tau\coloneqq\frac{\lambda}{2\rho}>0 τ : = 2 ρ λ > 0 , then
Π ( π x + τ { 2 ρ λ ( x − π x ) } ) = Π ( x ) ∋ π x , \Pi\left(\pi_{x}+\tau\left\{ \frac{2\rho}{\lambda}(x-\pi_{x})\right\} \right)=\Pi\left(x\right)\ni\pi_{x}, Π ( π x + τ { λ 2 ρ ( x − π x ) } ) = Π ( x ) ∋ π x , where the last inclusion follows from definition of projection. Thus 2 ρ λ ( x − π x ) ∈ N ( π x ) \frac{2\rho}{\lambda}(x-\pi_{x})\in N(\pi_{x}) λ 2 ρ ( x − π x ) ∈ N ( π x ) by (PrxNrmlCn). Also, from d ( x ) ≤ ∥ x ˉ − x ∥ < ( λ R / 2 ρ ) d(x)\leq\|\bar{x}-x\|<(\lambda R/2\rho) d ( x ) ≤ ∥ x ˉ − x ∥ < ( λ R /2 ρ ) , we have
∥ 2 ρ λ ( x − π x ) ∥ < R , \begin{aligned} \|\frac{2\rho}{\lambda}(x-\pi_{x})\|< & R,\end{aligned} ∥ λ 2 ρ ( x − π x ) ∥ < R ,
thus completing proof to (i).
From (i), we have for any x , y ∈ B ( x ˉ , λ R 2 ρ ) x,y\in B\left(\bar{x},\frac{\lambda R}{2\rho}\right) x , y ∈ B ( x ˉ , 2 ρ λ R ) , we have ( π x , 2 ρ λ ( x − π x ) ) , ( π y , 2 ρ λ ( y − π y ) ) ∈ V R ( x ˉ , 0 ) ∩ g r a N \left(\pi_{x},\frac{2\rho}{\lambda}(x-\pi_{x})\right),\left(\pi_{y},\frac{2\rho}{\lambda}(y-\pi_{y})\right)\in\mathcal{V}_{R}(\bar{x},0)\cap\mathbf{gra}N ( π x , λ 2 ρ ( x − π x ) ) , ( π y , λ 2 ρ ( y − π y ) ) ∈ V R ( x ˉ , 0 ) ∩ gra N , and using (HypMntn), we have
0 ≤ ⟨ 2 ρ λ ( x − π x ) − 2 ρ λ ( y − π y ) ∣ π x − π y ⟩ + ρ ∥ π x − π y ∥ 2 = ⟨ 2 ρ λ ( x − y ) − 2 ρ λ ( π x − π y ) ∣ π x − π y ⟩ + ρ ∥ π x − π y ∥ 2 = 2 ρ λ ⟨ x − y ∣ π x − π y ⟩ − 2 ρ λ ∥ π x − π y ∥ 2 + ρ ∥ π x − π y ∥ 2 undefined = − 2 ρ λ ( 1 − λ 2 ) ∥ π x − π y ∥ 2 = 2 ρ λ ⟨ x − y ∣ π x − π y ⟩ − 2 ρ λ ( 1 − λ 2 ) ∥ π x − π y ∥ 2 ⇔ ( 1 − λ 2 ) ∥ π x − π y ∥ 2 ≤ ⟨ x − y ∣ π x − π y ⟩ . \begin{aligned} 0 & \leq\left\langle \frac{2\rho}{\lambda}(x-\pi_{x})-\frac{2\rho}{\lambda}(y-\pi_{y})\mid\pi_{x}-\pi_{y}\right\rangle +\rho\|\pi_{x}-\pi_{y}\|^{2}\\ & =\left\langle \frac{2\rho}{\lambda}(x-y)-\frac{2\rho}{\lambda}(\pi_{x}-\pi_{y})\mid\pi_{x}-\pi_{y}\right\rangle +\rho\|\pi_{x}-\pi_{y}\|^{2}\\ & =\frac{2\rho}{\lambda}\left\langle x-y\mid\pi_{x}-\pi_{y}\right\rangle \underbrace{-\frac{2\rho}{\lambda}\|\pi_{x}-\pi_{y}\|^{2}+\rho\|\pi_{x}-\pi_{y}\|^{2}}_{=-\frac{2\rho}{\lambda}\left(1-\frac{\lambda}{2}\right)\|\pi_{x}-\pi_{y}\|^{2}}\\ & =\frac{2\rho}{\lambda}\left\langle x-y\mid\pi_{x}-\pi_{y}\right\rangle -\frac{2\rho}{\lambda}\left(1-\frac{\lambda}{2}\right)\|\pi_{x}-\pi_{y}\|^{2}\\ \Leftrightarrow & \left(1-\frac{\lambda}{2}\right)\|\pi_{x}-\pi_{y}\|^{2}\leq\left\langle x-y\mid\pi_{x}-\pi_{y}\right\rangle .\end{aligned} 0 ⇔ ≤ ⟨ λ 2 ρ ( x − π x ) − λ 2 ρ ( y − π y ) ∣ π x − π y ⟩ + ρ ∥ π x − π y ∥ 2 = ⟨ λ 2 ρ ( x − y ) − λ 2 ρ ( π x − π y ) ∣ π x − π y ⟩ + ρ ∥ π x − π y ∥ 2 = λ 2 ρ ⟨ x − y ∣ π x − π y ⟩ = − λ 2 ρ ( 1 − 2 λ ) ∥ π x − π y ∥ 2 − λ 2 ρ ∥ π x − π y ∥ 2 + ρ ∥ π x − π y ∥ 2 = λ 2 ρ ⟨ x − y ∣ π x − π y ⟩ − λ 2 ρ ( 1 − 2 λ ) ∥ π x − π y ∥ 2 ( 1 − 2 λ ) ∥ π x − π y ∥ 2 ≤ ⟨ x − y ∣ π x − π y ⟩ . So we have shown that
∀ ( x , π x ) , ( y , π y ) ∈ g r a Π ∩ B ( x ˉ , λ R 2 ρ ) ⟨ x − y ∣ π x − π y ⟩ ≥ ( 1 − λ 2 ) ∥ π x − π y ∥ 2 , \forall_{(x,\pi_{x}),(y,\pi_{y})\in\mathbf{gra}\Pi\cap B\left(\bar{x},\frac{\lambda R}{2\rho}\right)}\quad\left\langle x-y\mid\pi_{x}-\pi_{y}\right\rangle \geq\left(1-\frac{\lambda}{2}\right)\|\pi_{x}-\pi_{y}\|^{2}, ∀ ( x , π x ) , ( y , π y ) ∈ gra Π ∩ B ( x ˉ , 2 ρ λ R ) ⟨ x − y ∣ π x − π y ⟩ ≥ ( 1 − 2 λ ) ∥ π x − π y ∥ 2 , hence the projection operator Π \Pi Π is [ ( 2 − λ ) / 2 ] \left[(2-\lambda)/2\right] [ ( 2 − λ ) /2 ] -cocoercive on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) from (CocrcvOpt). Because a β \beta β -cocoercive operator on S S S is also 1 β \frac{1}{\beta} β 1 -Lipschitz on S S S , we have the projection operator Π \Pi Π being [ 2 / ( 2 − λ ) ] \left[2/(2-\lambda)\right] [ 2/ ( 2 − λ ) ] -Lipschitz continuous on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) , i.e. ,
∀ ( x , π x ) , ( y , π y ) ∈ g r a Π ∩ B ( x ˉ , λ R 2 ρ ) ∥ π x − π y ∥ ≤ 2 2 − λ ∥ x − y ∥ . ( LipCont ) \forall_{(x,\pi_{x}),(y,\pi_{y})\in\mathbf{gra}\Pi\cap B\left(\bar{x},\frac{\lambda R}{2\rho}\right)}\quad\|\pi_{x}-\pi_{y}\|\leq\frac{2}{2-\lambda}\|x-y\|.\quad(\textrm{LipCont}) ∀ ( x , π x ) , ( y , π y ) ∈ gra Π ∩ B ( x ˉ , 2 ρ λ R ) ∥ π x − π y ∥ ≤ 2 − λ 2 ∥ x − y ∥. ( LipCont ) This proves (ii).
Take a point x ∈ B ( x ˉ , λ R 2 ρ ) . x\in B\left(\bar{x},\frac{\lambda R}{2\rho}\right). x ∈ B ( x ˉ , 2 ρ λ R ) . Due to (ii), the projection operator Π ( x ) \Pi(x) Π ( x ) is single-valued on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) .
From (DistSqdLcLip) ϕ λ \phi_{\lambda} ϕ λ is locally Lipschitz, so we will employ (LocCvx) to prove its convexity on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) . First, we will show that on the set
S = { ( x , u ) ∣ ( x , u ) ∈ g r a ∂ ϕ λ , x ∈ B ( x ˉ , λ R 2 ρ ) } S=\left\{ (x,u)\mid(x,u)\in\mathbf{gra}\partial\phi_{\lambda},x\in B\left(\bar{x},\frac{\lambda R}{2\rho}\right)\right\} S = { ( x , u ) ∣ ( x , u ) ∈ gra ∂ ϕ λ , x ∈ B ( x ˉ , 2 ρ λ R ) } the operator ∂ ϕ λ \partial\phi_{\lambda} ∂ ϕ λ is monotone, which will help in proving that ϕ λ \phi_{\lambda} ϕ λ is convex and differentiable on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) .
Recall from (FrechSubDistSqd) that
∀ x : ∂ d 2 ( x ) ≠ ∅ ∂ d 2 ( x ) = 2 ( x − π x ) . \forall_{x:\partial d^{2}(x)\neq\emptyset}\quad\partial d^{2}(x)=2(x-\pi_{x}). ∀ x : ∂ d 2 ( x ) = ∅ ∂ d 2 ( x ) = 2 ( x − π x ) . Consider two points ( x , u ) , ( y , v ) ∈ S (x,u),(y,v)\in S ( x , u ) , ( y , v ) ∈ S . Without loss of generality, we can assume that ∂ d 2 ( x ) , ∂ d 2 ( y ) \partial d^{2}(x),\partial d^{2}(y) ∂ d 2 ( x ) , ∂ d 2 ( y ) are nonempty, because for the empty case (iii) is vacuously true. Then from (FrechSubDistSqd) we have ∂ d 2 ( x ) = 2 ( x − π x ) \partial d^{2}(x)=2(x-\pi_{x}) ∂ d 2 ( x ) = 2 ( x − π x ) and ∂ d 2 ( y ) = 2 ( y − π y ) \partial d^{2}(y)=2(y-\pi_{y}) ∂ d 2 ( y ) = 2 ( y − π y ) . On these points,
∂ ϕ λ ( x ) = ∂ d 2 ( x ) + 2 λ 2 − λ ( x ) , and ∂ ϕ λ ( y ) = ∂ d 2 ( y ) + 2 λ 2 − λ ( y ) . \partial\phi_{\lambda}(x)=\partial d^{2}(x)+2\frac{\lambda}{2-\lambda}(x),\textrm{ and }\partial\phi_{\lambda}(y)=\partial d^{2}(y)+2\frac{\lambda}{2-\lambda}(y). ∂ ϕ λ ( x ) = ∂ d 2 ( x ) + 2 2 − λ λ ( x ) , and ∂ ϕ λ ( y ) = ∂ d 2 ( y ) + 2 2 − λ λ ( y ) . We want to show that for any such ( x , u ) , ( y , v ) ∈ S (x,u),(y,v)\in S ( x , u ) , ( y , v ) ∈ S , we have
0 ≤ ⟨ ∂ ϕ λ ( x ) − ∂ ϕ λ ( y ) ∣ x − y ⟩ = ⟨ ∂ d 2 ( x ) + 2 λ 2 − λ ( x ) − ∂ d 2 ( y ) − 2 λ 2 − λ ( y ) ∣ x − y ⟩ = ⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ + 2 λ 2 − λ ∥ x − y ∥ 2 . ( GoalA ) \begin{aligned} 0 & \leq\left\langle \partial\phi_{\lambda}(x)-\partial\phi_{\lambda}(y)\mid x-y\right\rangle \\ & =\left\langle \partial d^{2}(x)+2\frac{\lambda}{2-\lambda}(x)-\partial d^{2}(y)-2\frac{\lambda}{2-\lambda}(y)\mid x-y\right\rangle \\ & =\left\langle \partial d^{2}(x)-\partial d^{2}(y)\mid x-y\right\rangle +2\frac{\lambda}{2-\lambda}\|x-y\|^{2}.\quad(\textrm{GoalA})\end{aligned} 0 ≤ ⟨ ∂ ϕ λ ( x ) − ∂ ϕ λ ( y ) ∣ x − y ⟩ = ⟨ ∂ d 2 ( x ) + 2 2 − λ λ ( x ) − ∂ d 2 ( y ) − 2 2 − λ λ ( y ) ∣ x − y ⟩ = ⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ + 2 2 − λ λ ∥ x − y ∥ 2 . ( GoalA ) To prove (GoalA), first we note that
⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ = ⟨ 2 ( x − π x ) − 2 ( y − π y ) ∣ x − y ⟩ = 2 ⟨ ( x − y ) − ( π x − π y ) ∣ x − y ⟩ = 2 ∥ x − y ∥ 2 − 2 ⟨ π x − π y ∣ x − y ⟩ . ( EqPart1 ) \begin{aligned} \left\langle \partial d^{2}(x)-\partial d^{2}(y)\mid x-y\right\rangle & =\left\langle 2(x-\pi_{x})-2(y-\pi_{y})\mid x-y\right\rangle \\ & =2\left\langle (x-y)-(\pi_{x}-\pi_{y})\mid x-y\right\rangle \\ & =2\|x-y\|^{2}-2\left\langle \pi_{x}-\pi_{y}\mid x-y\right\rangle .\quad(\textrm{EqPart1})\end{aligned} ⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ = ⟨ 2 ( x − π x ) − 2 ( y − π y ) ∣ x − y ⟩ = 2 ⟨ ( x − y ) − ( π x − π y ) ∣ x − y ⟩ = 2∥ x − y ∥ 2 − 2 ⟨ π x − π y ∣ x − y ⟩ . ( EqPart1 ) Now, using Cauchy–Schwarz inequality, we have
⟨ π x − π y ∣ x − y ⟩ ≤ ∥ π x − π y ∥ undefined ≤ 2 2 − λ ∥ x − y ∥ ∥ x − y ∥ ≤ 2 2 − λ ∥ x − y ∥ 2 ⇒ − 2 ⟨ π x − π y ∣ x − y ⟩ ≥ − 4 2 − λ ∥ x − y ∥ 2 , \begin{aligned} \left\langle \pi_{x}-\pi_{y}\mid x-y\right\rangle & \leq\underbrace{\|\pi_{x}-\pi_{y}\|}_{\leq\frac{2}{2-\lambda}\|x-y\|}\|x-y\|\\ & \leq\frac{2}{2-\lambda}\|x-y\|^{2}\\ \Rightarrow-2\left\langle \pi_{x}-\pi_{y}\mid x-y\right\rangle & \geq-\frac{4}{2-\lambda}\|x-y\|^{2},\end{aligned} ⟨ π x − π y ∣ x − y ⟩ ⇒ − 2 ⟨ π x − π y ∣ x − y ⟩ ≤ ≤ 2 − λ 2 ∥ x − y ∥ ∥ π x − π y ∥ ∥ x − y ∥ ≤ 2 − λ 2 ∥ x − y ∥ 2 ≥ − 2 − λ 4 ∥ x − y ∥ 2 , and putting this in (EqPart1), we have
⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ ≥ 2 ∥ x − y ∥ 2 − 4 2 − λ ∥ x − y ∥ 2 ≥ ( 2 − 4 2 − λ ) ∥ x − y ∥ 2 = − 2 λ 2 − λ ∥ x − y ∥ 2 ⇒ ⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ + 2 λ 2 − λ ∥ x − y ∥ 2 ≥ 0 , \begin{aligned} \left\langle \partial d^{2}(x)-\partial d^{2}(y)\mid x-y\right\rangle & \geq2\|x-y\|^{2}-\frac{4}{2-\lambda}\|x-y\|^{2}\\ & \geq\left(2-\frac{4}{2-\lambda}\right)\|x-y\|^{2}\\ & =\frac{-2\lambda}{2-\lambda}\|x-y\|^{2}\\ \Rightarrow\left\langle \partial d^{2}(x)-\partial d^{2}(y)\mid x-y\right\rangle +2\frac{\lambda}{2-\lambda}\|x-y\|^{2} & \geq0,\end{aligned} ⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ ⇒ ⟨ ∂ d 2 ( x ) − ∂ d 2 ( y ) ∣ x − y ⟩ + 2 2 − λ λ ∥ x − y ∥ 2 ≥ 2∥ x − y ∥ 2 − 2 − λ 4 ∥ x − y ∥ 2 ≥ ( 2 − 2 − λ 4 ) ∥ x − y ∥ 2 = 2 − λ − 2 λ ∥ x − y ∥ 2 ≥ 0 , thus reaching (GoalA). So, we have shown that on S S S , ∂ ϕ λ \partial\phi_{\lambda} ∂ ϕ λ is monotone on S S S . As ϕ λ \phi_{\lambda} ϕ λ is locally Lipschitz, it means that ∂ Clarke f \partial^{\textrm{Clarke}}f ∂ Clarke f is monotone on S S S , and due to (LocCvx), we have ϕ λ \phi_{\lambda} ϕ λ convex on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) . This further implies that, for any x x x in B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) , ∂ ϕ λ ( x ) = ∂ Clarke ϕ λ ( x ) , \partial\phi_{\lambda}(x)=\partial^{\textrm{Clarke}}\phi_{\lambda}(x), ∂ ϕ λ ( x ) = ∂ Clarke ϕ λ ( x ) , and due to the locally Lipschitz nature of ϕ λ , \phi_{\lambda}, ϕ λ , it has nonempty Clarke subdifferential everywhere (Correa et al. (1992) ) [Property 2.2]. So, all points in B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) is Frechet subdifferentiable, i.e., for any x x x in B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) , we have ∂ ϕ λ ( x ) = ∂ Clarke ϕ λ ( x ) ≠ ∅ \partial\phi_{\lambda}(x)=\partial^{\textrm{Clarke}}\phi_{\lambda}(x)\neq\emptyset ∂ ϕ λ ( x ) = ∂ Clarke ϕ λ ( x ) = ∅ , and as we have shown before, on those it is in fact differentiable with the gradient
∂ ϕ λ ( x ) = 2 ( x − π x ) + 2 2 2 − λ x . \partial\phi_{\lambda}(x)=2(x-\pi_{x})+2\frac{2}{2-\lambda}x. ∂ ϕ λ ( x ) = 2 ( x − π x ) + 2 2 − λ 2 x .
Thus, on B ( x ˉ , λ R 2 ρ ) B\left(\bar{x},\frac{\lambda R}{2\rho}\right) B ( x ˉ , 2 ρ λ R ) , the function ϕ λ \phi_{\lambda} ϕ λ is convex and differentiable.
Rockafellar, R. Tyrrell, and Roger J-B. Wets. Variational analysis . Vol. 317. Springer Science & Business Media, 2009. [pdf ]
Correa, Rafael, Alejandro Jofre, and Lionel Thibault. "Characterization of lower semicontinuous convex functions." Proceedings of the American Mathematical Society (1992): 67-72. [pdf ]
Poliquin, René, R. Rockafellar, and Lionel Thibault. "Local differentiability of distance functions." Transactions of the American mathematical Society 352.11 (2000): 5231-5249. [pdf ]