Loading [MathJax]/jax/output/CommonHTML/jax.js

Convex optimization review

just to connect some dots

Posted by Xuan on August 30, 2018

TODO: 1) Convergence analysis 2) interior point primal dual method


Convexity

Separating hyperplane theorum

If two convex sets, A and B are disjoint, then there exists a hyperplane separates the two, i.e., there exist a and b such that aTxb for all x in A and aTxb for all x in B.

Supporting hyperplane theorum

For every point x0 in the boundary of a convex set C, denoted as bdC=clCintC, there exists a supporting hyperplane, i.e. there exists an a such that aTxaTx0 for all xC.


Duality

The Lagrange dual function

g(λ,ν)=infxD(f0(x)+λTf(x)+νTh(x))

  • It is a pointwise infimum of a family of affine functions of (λ,ν), hence it is concave.

  • For every (λ,ν) pair that is dual feasible, i.e. λ0 and (λ,ν)domg (which usually translates to g(λ,ν)> since g is a mapping to R), g(λ,ν) is a lower bound for the primal optimum.

    Proof: for every primal feasible point x~, g(λ,ν)f0(x~)+λTf(x~)+νTh(x~)f0(x~)p

  • g(λ,ν) constrained on dual feasible points can be viewed as a set of lower bounds of p

  • Saddle point interpretation: p=infxsupλ,νL(x,λ,ν)supλ,νinfxL(x,λ,ν)=d.

Slater’s condition and strong duality

If there is a strictly feasible point, meaning one such point that all the inequality constraints hold strictly, then strong duality holds.

  • The condition can be reduced to weaker form on affine inequility constraints that they can hold with equalities.
  • If a problem is self-dual, meaning the dual of the dual problem is the primal, e.g. SDP, then either of the primal-dual problem pair holds the Slater condition will gurantee strong duality.

Sketch of proof:

Consider A={(f(x)T,h(x)T,f0(x))|xD}+R+m×{0}×R+

and B={(u,v,t)|u=0,v=0,t<p}.

Then apply separating hyperplane theorum. The strictly feasible point is used to ensure that the hyperplane is non-vertical thus won’t pass thru B.

Duality gap

For a primal dual feasible point pair x and (λ,ν), f0(x)g(λ,ν) is refered to as the duality gap, which also gives a certificate of how close the primal value is to its optimum.

Complementarity

Assume strong duality holds, then we must have

f0(x)=g(λ,ν)f0(x)+λTf(x)+νTh(x)f0(x)

which implies that λTf(x)=0, so λ and f(x) are complementary.

KKT conditions:

Assumes zero duality gap.

KKT:

  • primal feasibility

    1) f(x)0; 2) h(x)=0

  • dual feasibility

    1) λ0; 2) xL(x,λ,ν)=0

  • complementarity (zero duality gap)

    λf(x)=0

i) If the primal problem is convex, then KKT conditions are equivalent for the points to be primal and dual optimal.

ii) If the primal problem is nonconvex, the KKT conditions are necessary.

Remark for dual feasibility:

xL(x,λ,ν)=0 means x minimizes L, then g(λ,ν)> and hence dual feasible.


Conjugacy

Conjugate function

The conjugate of a function f:RnR, f:RnR, is defined as

f(y)=supxdomf(yTxf(x))=infx(f(x)yTx)

but whose domain constrained on yRn for which the supremum is finite.

  • f is a pointwise supremum of a family of affine functions, hence convex.
  • Fenchel-Young’s inequality: f(x)+f(y)xTy.
  • if f is convex and closed, the f=f.

Rewrite Lagrange dual function

For a general linear constraint problem

minf0(x):Axb,Cx=d

using conjugate we can rewrite its Lagrange dual function as

g(λ,ν)=infx(f0(x)+λT(Axb)+νT(Cxd))=λTbνTdf0(ATλCTν).


Unconstrained algorithms: descent methods

The general descent algorithm

  • Determine a direction

    Look for a Δx with f(x(k))TΔx<0

  • Line search

    Find t to update x(k+1)x(k)+tΔx.

  • Check stopping criterion

Line search methods

  1. Exact line search

    targmins0f(x+sΔx)

  2. Backtrack line search

    while f(x+tΔx)>f(x)+αtf(x)TΔx, do tβt

Direction choice methods

  1. Gradient descent

    Δx=f(x)

  2. Newton’s method

    Δx=2f(x)1f(x)


Equality constrained algorithms

We are primarily converning linear equality constraints, since linear form is the only possibility of an equality constraint to be convex.

For minf(x):Ax=b , the KKT conditions become:

  1. dual feasibility: f(x)+ATν=0
  2. primal feasibility: Ax=b

Elimination of equality constraints

For a problem

minf(x):Ax=b

we can find a specific solution to the constraint, x0, and a matrix F such that Range(F)=Null(A). Thus we convert the problem into an unconstrained problem

minzf(Fz+x0).

Dual method: ADMM

Alternating Direction Method of Multipliers

  • Motivation

    • Dual ascend

      For a linear equality constraint (hence strong duality holds) problem minf(x):Ax=b, solve the dual problem (which is unconstrained) using gradient ascend.

      The gradient of the dual objective at yk is g(yk)=Axk+1b , where xk+1=argminxL(x,yk).

      The algorithm:

      • xk+1argminxL(x,yk)
      • yk+1yk+α(Axk+1b)
    • Dual decomposition

      If the primal objective f is separable in x, then xk=argminxL(x,yk) step can be decomposed into xik=argminxiLi(xi,yk). Note that all the xik are evaluated in parallel holding other components constant as from xk.

    • Augmented Lagrangian and the method of multipliers

      Still for the same linear equality constrained problem, one injects into the Lagrangian a penalty term for the residual of the equality constraint

      Lρ(x,y)=L(x,y)+(ρ/2)Axb2

      The algorithm is similar to dual ascend but set α=ρ.

  • ADMM

    Setting: primal objective decomposable. Augment the Lagrangian and use dual decomposition but update the components sequentially, taking latest values of other components.

Primal method: extension of Newton’s method

Feasible starting point method

Using second order approximation to primal objective at a point x yields

minv(f(x)+f(x)Tν+(1/2)vT2f(x)ν):A(x+ν)=b

The method starts with a feasible x(0), and Δx update preserves the feasibility.

Note that A(x+ν)=bAν=0. The Newton step Δx is then taken as the solution to the second order approximation problem, which is also the solution of the KKT condition equations

(2f(x)ATA0)(Δxν)=(f(x)0).

Infeasible starting point method

With a infeasible start point, we make updates to best approximate feasibility and satisfies the optimal conditions.

(1)(2f(x)ATA0)(Δxν)=(f(x)Axb)

Remark:

The infeasible start Newton’s method takes a step at a point not gaurenteeing a decrement step to the objective value. However, once it reaches feasibility, it will coincide with what is in the feasible case Newton’s method, and thereafter the step is sure to be a decrement step.

The algorithm:

Repeat until Ax=b and r(x,ν)ϵ

  • Compute primal and dual Newton steps Δx and Δν. The dual step Δν is the solution to (1) minus the previous ν.
  • Backtrack on r2. r is the primal-dual residual (f(x)+ATbAxb)
    • t1
    • while r(x+tΔx,ν+tΔν)2>(1αt)r(x,ν)2, do tβt
  • xx+tΔx, νν+Δν

Inequality constrained methods

Barrier method

  • Log barrier

    For a problem with inequality constraints:

    minf0(x):Ax=b,f(x)0

    relax it to:

    minf0(x)+i(1/t)log(fi(x)):Ax=b

    or equivalently:

    mintf0(x)+ϕ(x):Ax=b.

    The log barrier function i(1/t)log(fi(x)) is an approximation to the ideal barrier iI(fi(x)), where the t here is a parameter. The log barrieir is convex so it preserves the convexity of the objective. As t, the log barrier approaches the ideal barrier.

  • Central path and m/t-suboptimal

    The central path associated with the problem is defined as the set of points x(t) for t>0 such that the KKT conditions enhanced with strictly feasibility hold for the barrier relaxed problem parametrized at t.

    Central path conditions can also be interpreted as relaxed KKT conditions on the original problem with relaxed complementarity/duality gap:

    λifi(x)=1/t,i.

    For x(t) on the central path, it is a m/t-suboptimal point for the original problem, i.e.

    f0(x(t))pm/t.

  • Barrier method

    Choose a strictly feasible starting point x, and tt(0), μ>1, tolerance ϵ>0

    • Centering: xx(t), by solving mintf0(x)+ϕ(x):Ax=b, with starting point at the latest x
    • Update: tμt if m/t<ϵ else quit

    Remark:

    • An infeasible starting point method can be used to solve the equality constrained problem at the centering step.

    • To choose a strictly feasible starting point, we add an additional phase before called phase I. By solving an optimization problem such as:

      mins:f(x)s1,Ax=b. (where s is a scalar), or

      min1Ts:f(x)s,Ax=b,s0. (where s is a vector)

      We can compute a strictly feasible point for the original problem, or conclude that the constraints are infeasible.

      Note that the phase I problem is formulated such that a strict feasible point is easy to attain, so we can solve by pure barrier method. Or alternatively, we can also use infeasible starting point Newton’s method to solve the barrier problem associated with the phase I problem.

Primal dual interior point method

TODO


Solving symmetric linear system

We are considering a linear system

Ax=b, where A is PSD, to find the solution x is equivalent to find a minimum of:

ϕ(x)=1/2xTAxbTx

Conjugate gradient method - ref

  • Q-Conjugate

    {d1,,dn} are said to be Q-conjugate/orthogonal if diTQdj=0 for any ij.

  • Conjugate direction theorum

    Let a set of given vectors {d1,,dn} be Q-conjugate, denoted as (Q), and x0 be arbitrary starting point, then after n steps of the following process, xn=x:

    • xk+1=xk+αkdk
    • gk=Qxkb
    • αk=gkTdkdkTQdk

    Remark:

    This is essentially factorizing xx0 into the Q-conjugate basis {d1,,dn} sequentially. αk corresponds to the “coordinate” under the Q-conjugate basis in the sense that:

    (xx0)TQdkdkTQdk=(xxk)TQdkdkTQdk=αk, since

    (xxk)(xx0)=i=0,,k1αidi which is Q-conjugate with dk.

  • Conjugate gradient algorithm

    Since we are not given the Q-conjugate basis beforehand, so compute them also along the iteration:

    The update steps are the same with conjugate direction theorum:

    • xk+1=xk+αkdk
    • gk=Qxkb
    • αk=gkTdkdkTQdk

    but with additional dk calculation

    • d0=bQx0
    • dk+1=gk+1+βkdk
    • βk=gk+1TQdkdkTQdk

    Remark:

    To verify the correctness, observing that conjugate direction theorum holds for any basis, one only needs to show that the {dk} constructed by the conjugate gradient algorithm is indeed Q-conjugate.

    In the induction step of the proof, it is sufficient to show that both of the following hold:

    1. dk+1(Q)dk
    2. gk+1(Q){di<k+1}

    the first one is straightforward, the second is the same as in the remark in the conjugate direction theorum section.