Computing proximal operator of a constrained function in Julia

Shuvomoy Das Gupta

September 24, 2020


Table of contents


(Last update: January 13, 2021) In this blog, we will show how to compute proximal operator of a constrained function. The entire code is written in markdown using the package Weave.jl. The corresponding .jmd notebook (can be opened both as a markdown file in any text editor or code in Juno), can be downloaded here.

We will consider two examples, the first one arises in low-rank factor analysis problem, and the second one comes from lifted problem of a standard form binary integer optimization problem.

Example 1: Low-rank factor analysis

As an example we consider the function:

f(X,D)=ΣXDF2+IP(X,D), f(X,D) =\left\Vert \Sigma-X-D\right\Vert _{F}^{2}+I_{\mathcal{P}}(X,D),

where IPI_{\mathcal{P}} denotes the indicator function of the convex set

P={(X,D)Sn×SnX0,D=diag(d),d0,dRn,ΣD0}. \mathcal{P}=\{(X,D)\in\mathbf{S}^{n}\times\mathbf{S}^{n}\mid X\succeq0,D=\mathbf{diag}(d),d\geq0,d\in \mathbf{R}^{n}, \Sigma-D \succeq 0\}.

Computing the proximal operator of ff

Proximal operator proxγf\mathbf{prox}_{\gamma f} for this function ff at (X,D)(X,D) is the optimal solution to the following convex optimization problem:

minimizeΣXundefinedDundefinedF2+12γXundefinedXF2+12γDundefinedDF2subject toXundefined0Dundefined=diag(dundefined)dundefined0,ΣDundefined0 \begin{equation} \begin{array}{ll} \textrm{minimize} & \left\Vert \Sigma-\widetilde{X}-\widetilde{D}\right\Vert _{F}^{2}+\frac{1}{2\gamma}\|\widetilde{X}-X\|_{F}^{2}+\frac{1}{2\gamma}\|\widetilde{D}-D\|_{F}^{2}\\ \textrm{subject to} & \widetilde{X}\succeq0\\ & \widetilde{D}=\mathbf{diag}(\widetilde{d})\\ & \widetilde{d}\geq0, \\ & \Sigma - \widetilde{D} \succeq 0 \end{array} \end{equation}

where XundefinedS+n,\widetilde{X}\in\mathbf{S}_{+}^{n}, and dundefinedR+n\widetilde{d}\in \mathbf{R}_{+}^{n}(i.e., Dundefined=diag(dundefined\widetilde{D}=\mathbf{diag}(\widetilde{d})) are the optimization variables.

Now we solve this optimization problem using Julia. We will use the package Convex and SCS, both open source Julia packages.

Load the packages

First, we load the packages. If the packages are not installed we can install them by running the following commands in Julia REPL.

using Pkg
Pkg.add("Convex")
Pkg.add("COSMO")
## Load the packages
using Convex
using LinearAlgebra
using COSMO
using JuMP
using SCS

Solver function

Let us write the solver function that is going to solve the optimization problem that we described above. The first implementation is using Convex.jl and the second one is via JuMP.

First implementation using Convex.jl
# put everything in a function: first implementatino is using Convex.jl
function prox_PRS_fam_cvxjl(Σ, M, γ, X, d) #(Σ::A, M::R, γ::R, X::A, d::V) where {R <: Real, A <: AbstractMatrix{R}, V <:  AbstractVector{R}} # For now M is not used, may use it in a future version

  # This functions takes the input data Σ, γ, X, d and evaluates
  # the proximal operator of the function f at the point (X,D)

  # Data extraction
  # ---------------
  n = length(d) # dimension of the problem
  # println("*****************************")
  # println(size(d))
  # println("the value of d is = ", d)
  # println("the type of d is", typeof(d))
  D = LinearAlgebra.diagm(d) # creates the diagonal matrix D that embed

  # Create the variables
  #  --------------------
  X_tl = Convex.Semidefinite(n) # Here Semidefinite(n) encodes that
  # X_tl ≡ ̃X is a positive semidefinite matrix
  d_tl = Convex.Variable(n) # d_tl ≡ ̃d
  D_tl = diagm(d_tl) # Create the diagonal matrix ̃D from ̃d

  # Create terms of the objective function, which we write down
  #  in three parts
  #  ----------------------------------------------------------
  t1 = square(norm2(Σ - X_tl - D_tl)) # norm2 computes Frobenius norm in Convex.jl
  t2 = square(norm2(X-X_tl))
  t3 = square(norm2(D-D_tl))

  # Create objective
  # ----------------
  objective = t1 + (1/(2*γ))*(t2 + t3) # the objective to be minimized

  # create the problem instance
  # ---------------------------
  convex_problem = Convex.minimize(objective, [d_tl >= 0, Σ - D_tl  in :SDP])

  # set the solver
  # --------------
  convex_solver = () -> SCS.Optimizer(verbose=false)

  # solve the problem
  # -----------------
  Convex.solve!(convex_problem, convex_solver)

  # get the optimal solution
  # ------------------------
  X_sol = X_tl.value
  d_sol = d_tl.value
  # println("d_sol = ", d_sol)

  # return the output
  return X_sol, vec(d_sol)

end
Second implementation using JuMP
## put everything in a function (implementation using JuMP)
function prox_PRS_fam_JuMP(Σ, M, γ, X, d; X_tl_sv = nothing, d_tl_sv = nothing, warm_start = false)

	# This functions takes the input data Σ, γ, X, d and evaluates the proximal operator of the function f at the point (X,d)

	n = length(d)

	prox_model = JuMP.Model(optimizer_with_attributes(SCS.Optimizer, "verbose" => false))

	# prox_model = JuMP.Model(optimizer_with_attributes(Mosek.Optimizer))


	@variables( prox_model,
	begin
		d_tl[1:n] >= 0
		X_tl[1:n, 1:n], PSD
	end
	)

	if warm_start == true
		println("warm start enabled")
		# Set warm-start
		set_start_value.(X_tl, X_tl_sv) # Warm start
		set_start_value.(d_tl, d_tl_sv) # Warm start
	    println("norm difference is = ", norm(start_value.(X_tl) - X_tl_sv))
	end



    t_1 = vec(Σ - X_tl - diagm(d_tl))
	t_2 = vec(X_tl-X)
	t_3 = vec(diagm(d_tl)-diagm(d))
	obj = t_1'*t_1 + ((1/(2*γ))*(t_2'*t_2 + t_3'*t_3))

	@objective(prox_model, Min, obj)

	@constraints(prox_model, begin
		Symmetric(Σ - diagm(d_tl)) in PSDCone()
	end)

	set_silent(prox_model)

	JuMP.optimize!(prox_model)

	# obj_val = JuMP.objective_value(prox_model)
	X_sol = JuMP.value.(X_tl)
	d_sol = JuMP.value.(d_tl)

	return X_sol, d_sol

end

Create data to test

We create the data now to test the function we just wrote.

## All the parameters
n = 10
Σ1 = randn(n,n)
Σ = Σ1'*Σ1
X = randn(n,n)
d = randn(n)
M = 1
γ = 1

Test the function

We test the function now to see if the function prox_over_matrix works as expected!

## Time to run the code
@time X1, d1 = prox_PRS_fam_cvxjl(Σ, M, γ, X, d)

@time X2, d2 = prox_PRS_fam_JuMP(Σ, M, γ, X, d)

## Test for warmstarting

X_tl_sv = X2
d_tl_sv = d2

@time X2, d2 = prox_PRS_fam_JuMP(Σ, M, γ, X, d; X_tl_sv = X2, d_tl_sv = d2, warm_start = true)
Output:
2.263578 seconds (11.69 k allocations: 1.033 MiB)
  1.613454 seconds (18.60 k allocations: 2.731 MiB)
  warm start enabled
  norm difference is = 0.0
  1.643022 seconds (19.30 k allocations: 2.764 MiB)

Example 2. Lifted SDP relaxation of a standard form 0-1 integer optimization problem

The standard form binary integer optimization problem is a universal combinatorial problem with the following form:

minimizecxsubject toAx=bx0x{0,1}n, \begin{equation} \begin{array}{ll} \textrm{minimize} & c^{\top}x\\ \textrm{subject to} & Ax=b\\ & x\geq0\\ & x\in\{0,1\}^{n}, \end{array} \end{equation}

where xx is the decision variable, and ARm×n,bRm,cRnA\in\mathbf{R}^{m\times n},b\in\mathbf{R}^{m},c\in\mathbf{R}^{n} are problem data. The matrix AA is taken to be fat (m<n)(m<n) and full row rank. This problem has the following higher-dimensional equivalent formulation based on the lifting procedure originally proposed by Lovász and Schrijver:

minimizej=1ncjXjjsubject toj=1nAjXij=bXii,i=1,,n,j=1nAjXjj=b0XijXii,i,j=1,,n,ij,0Xij1,i,j=1,,n,Xii+XjjXij1,i,j=1,,n,ij,X0,rankX=1, \begin{equation} \begin{array}{ll} \textrm{minimize} & \sum_{j=1}^{n}c_{j}X_{jj}\\ \textrm{subject to} & \sum_{j=1}^{n}A_{j}X_{ij}=bX_{ii},\qquad i=1,\ldots,n,\\ & \sum_{j=1}^{n}A_{j}X_{jj}=b\\ & 0\leq X_{ij}\leq X_{ii},\qquad i,j=1,\ldots,n,i\neq j,\\ & 0\leq X_{ij}\leq1,\qquad i,j=1,\ldots,n,\\ & X_{ii}+X_{jj}-X_{ij}\leq1,\qquad i,j=1,\ldots,n,i\neq j,\\ & X\succeq0,\\ & \mathbf{rank} X=1, \end{array} \end{equation}

where XSnX\in\mathbf{S}^{n} is the decision variable and is related to the original decision variable xx by the encoding X=xx.X=xx^{\top}. In this formulation AjA_{j} represents the jj-th column of the matrix AA.

Let us drop the nonconvex rank constraint, and consider the function

f(X):=j=1ncjXjj+IC(X), f(X) :=\sum_{j=1}^{n}c_{j}X_{jj} +I_{\mathcal{C}}(X),

where ICI_{\mathcal{C}} denotes the indicator function of the convex set

C:={XSnj=1nAjXij=bXii,i1,,n,j=1nAjXjj=b,0XijXii,i,j=1,,n,ij,0Xij1,i,j=1,,nXii+XjjXij1,i,j=1,,n,ij,X0}. \begin{equation} \begin{array}{ll} \mathcal{C} := & \{ X\in\mathbf{S}^{n}\mid\sum_{j=1}^{n}A_{j}X_{ij}=bX_{ii},\qquad i\in1,\ldots,n,\\ & \qquad\qquad\sum_{j=1}^{n}A_{j}X_{jj}=b,\\ & \qquad\qquad 0\leq X_{ij}\leq X_{ii},\qquad i,j=1,\ldots,n,i\neq j,\\ & \qquad \qquad0\leq X_{ij}\leq1,\qquad i,j=1,\ldots,n\\ & \qquad \qquad X_{ii}+X_{jj}-X_{ij}\leq1,\qquad i,j=1,\ldots,n,i\neq j,\\ & \qquad \qquad X\succeq0 \}. \end{array} \end{equation}

Computing the proximal operator of ff

Proximal operator proxγf\mathbf{prox}_{\gamma f} for this function ff at XX is the optimal solution to the following convex optimization problem:

minimizej=1ncjXundefinedjj+12γXundefinedXF2subject toj=1nAjXundefinedij=bXundefinedii,i=1,,n,j=1nAjXundefinedjj=b0XundefinedijXundefinedii,i,j=1,,n,ij,0Xundefinedij1,i,j=1,,n,Xundefinedii+XundefinedjjXundefinedij1,i,j=1,,n,ij,Xundefined0, \begin{equation} \begin{array}{ll} \textrm{minimize} & \sum_{j=1}^{n}c_{j}\widetilde{X}_{jj}+\frac{1}{2\gamma}\|\widetilde{X}-X\|_{F}^{2}\\ \textrm{subject to} & \sum_{j=1}^{n}A_{j}\widetilde X_{ij}=b \widetilde X_{ii},\qquad i=1,\ldots,n,\\ & \sum_{j=1}^{n}A_{j} \widetilde X_{jj}=b\\ & 0\leq \widetilde X_{ij}\leq \widetilde X_{ii},\qquad i,j=1,\ldots,n,i\neq j,\\ & 0\leq \widetilde X_{ij}\leq1,\qquad i,j=1,\ldots,n,\\ & \widetilde X_{ii}+\widetilde X_{jj}-\widetilde X_{ij}\leq1,\qquad i,j=1,\ldots,n,i\neq j,\\ & \widetilde X\succeq0,\\ \end{array} \end{equation}

where XundefinedS+n\widetilde{X}\in\mathbf{S}_{+}^{n} is the optimization variables.

Load the packages

First, we load the packages.

## Load the packages
using Convex
using LinearAlgebra
using COSMO
using JuMP
using SCS
# Problem Data
m = 5
n = 10
using Random: bitrand
c = randn(n)
x0 = bitrand(n)
A = randn(m,n)
b = A*x0 # this is to ensure that the problem has a solution
x_test = randn(n)
X = x_test*x_test' # this is the point where we want to evaluate the proximal operator
γ = 1 # proximal parameter
# Evaluating the proximal operator (put the code in a function)
prox_model = JuMP.Model(optimizer_with_attributes(Mosek.Optimizer))

@variables(prox_model,
begin
	X_tl[1:n, 1:n], PSD
end
)

t_1 = sum(c[j]*X_tl[j,j] for j in 1:n)
t_2 = vec(X_tl-X)
obj = t_1 + ((1/(2*γ))*(t_2'*t_2))

@objective(prox_model, Min, obj)

@constraints(prox_model, begin
con_sdp_equality_1[i = 1:n], sum(X_tl[j,j].*A[:,j] for j in 1:n) .== b*X_tl[i,i]
con_sdp_equality_2, sum(X_tl[j,j].*A[:,j] for j in 1:n) .== b
con_bound[i = 1:n, j = 1:n], 0<= X_tl[i,j] <= 1
cons_sdp_bound_1[i = 1:n, j= 1:n, i != j], X_tl[i,i]+X_tl[j,j]-X_tl[i,j] <= 1
con_sdp_bound_2[i = 1:n, j = 1:n, i != j], X_tl[i,j] <= X_tl[i,i]
end)

set_silent(prox_model)

JuMP.optimize!(prox_model)

obj_val = JuMP.objective_value(prox_model)

X_sol = JuMP.value.(X_tl)
Converting .jmd file to .jl file

To convert the .jmd file to a .jl file we run the following code:

using Weave
cd("C:\\Users\\shuvo\\Google Drive\\GitHub\\blog\\codes") # directory that contains the .jmd file
tangle("2020-09-08-proximal_operator_over_matrix.jmd", informat = "markdown") # convert the .jmd file into a .jl file that will contain the code