Inequality Constrained Finite Dimension Optimization
Problem Definition
Our problem is of the form
Defn. Active Constraint
Let \(x_0\) satisfy the above constraints. We call the inequality constraint \(g_i(x) \leq 0\) active at \(x_0\) if \(g_i(x_0) = 0\). Otherwise, it is inactive at \(x_0\).
Defn. Regular point
Suppose there is an index \(l' \leq l\) such that \(g_1(x_0) =, \dots, g_{l'}(x_0) = 0\) are active, and \(g_{l'+1}(x_0) \leq 0, \dots, g_l(x_0) \leq 0\) are inactive. We say that \(x_0\) is a regular point of these constraints if the vectors \(\nabla h_1(x_0), \dots, \nabla h_k(x_0), \nabla g_1(x_0), \dots, \nabla g_{l'}(x_0)\) are linearly independent.
Thrm. First Order Necessary Conditions (Kuhn-Tucker Conditions)
Claim. Let \(\Omega \subseteq \mathbb R^n\) be open and consider \(C^1\) functions \(f, h_1, \dots, h_k, g_1, \dots, g_l\) on \(\Omega\). Suppose \(x_0\) is a local minimizer of \(f\) subject to the constraints, and that \(x_0\) is regular as defined above. Then 1. There exist \(\lambda_1, \dots, \lambda_k \in \mathbb R\) and \(\mu_1, \dots, \mu_l \in \mathbb R^{\geq 0}\) such that \(\nabla f(x_0) + \sum \lambda_i \nabla h_i(x_0) + \sum \mu_j \nabla g_j(x_0) = 0\) 2. (Complementary slackness conditions) For all \(j\), \(\mu_j g_j(x_0) = 0\), or equivalently, \(\sum \mu_j g_j(x_0) = 0\).
Suppose the active constraints at \(x_0\) are the first \(l'\) constraints. Since each \(\mu_j \geq 0\), condition (ii) is equivalent to saying that if \(j \geq l'+1\), then \(\mu_j = 0\).
proof. If \(x_0\) is a local minimizer of \(f\) subject to the constraints, then it is certainly a local minimizer of \(f\) subject to only the active constraints. That is, \(x_0\) is also a local minimizer of \(f\) subject to the equality constraints
We know how to work with this! Let \(M\) be the surface defined by these equality constraints. By the Lagrange multipliers theorem,
for some \(\lambda_i \in \mathbb R, \mu_j \in \mathbb R\). (Note that we have not yet shown that the \(\mu_j\)'s are non-negative.)
Note that \(g_1(x_0) = \cdots = g_{l'}(x_0) = 0\). Therefore \(\mu_{l'+1} = \cdots = \mu_l = 0\), so it follows that
which implies that \(\mu_j g_j(x_0) = 0\) for all \(j\). We have proven condition (ii).
We must now verify the non-negativity of the \(\mu_j\)'s. Suppose for the sake of contradiction that some \(\mu_j < 0\); WLOG assume \(j=1\). Let
Since \(x_0\) is a regular point of \(M\), \(x_0\) is a regular point of \(\tilde{M}\). Therefore
The vector \(\nabla g_1(x_0)\) does not lie in this span, so there is a \(v \in T_{x_0}\tilde{M}\) such that \(\nabla g_1(x_0) \cdot v < 0\). That is, \(g_1\) is strictly decreasing in the direction of \(v\), or in more precise language, \(g_1(x_0 + sv) < g_1(x_0)\) for all sufficiently small \(s\), as we have
Therefore \(v\) is a feasible direction for \(g_1(x) \leq 0\) at \(x_0\), and also, \(v\) is tangential to the other constraints. Since \(x_0\) is a regular point of \(\tilde{M}\), we may find a curve \(x(s)\) on \(\tilde{M}\) such that \(x(0) = x_0\) and \(x'(0) = v\). Also, \(s = 0\) is a local minimizer of \(f \circ x\), so
On the other hand,
Taking the dot product of the above equation by \(v\) kills the two sums above and gives
implying \(\nabla f(x_0)\cdot v < 0\), a contradiction. So every \(\mu_j \geq 0\).
Thrm. Second Order Necessary Conditions
Claim. Suppose \(f, h_1, \dots, h_k, g_1, \dots, g_k \in C^2(\Omega)\), where \(\Omega \subseteq \mathbb R^n\). Suppose \(x_0\) is a regular point of the constraints. If \(x_0\) is a local minimizer of \(f\) subject to the constraints, then 1. There are \(\lambda_1, \dots, \lambda_k \in \mathbb R\) and \(\mu_1, \dots, \mu_l \geq 0\) such that \(\nabla f(x_0) + \sum_i \lambda_i \nabla h_i(x_0) + \sum_j \mu_j \nabla g_j(x_0) = 0\) and \(\mu_j g_j(x_0) = 0\) for each \(j\).
- The matrix \(L(x_0) = \nabla^2 f(x_0) + \sum_i \lambda_i \nabla^2 h_i(x_0) + \sum_j \mu_j \nabla^2 g_j(x_0)\) is positive semi-definite on the tangent space \(T_{x_0}\tilde{M}\) to the active constraints at \(x_0\). (Explicitly, \(L(x_0)\) is positive semi-definite on the space \(T_{x_0}\tilde{M} = \{v \in \mathbb R^n : \nabla h_i(x_0) \cdot v = 0. \forall i, \nabla g_j(x_0) \cdot v = 0 \text{ where }1 \leq j \leq l\}\) where the active \(g\) constraints are indexed precisely by \(1, \dots, l'\)
proof. \(x_0\) is a local minimizer of \(f\) subject to the constraints, so it is also a local minimizer of \(f\) subject to only the active constraints. Since the Lagrange multiplies of the inactive constraints are zero, our theory of equality-constrained minimization finishes the problem.
Example 1
Consider, for example, the problem
The feasible set is the closed unit ball \(\overline{B_1(0)}\) with an open semicircle removed from the top right. Geometrically, it is clear that the minimizer should be the point \((1,0)\). It is not hard to check that every feasible point is regular. Let's check that \((x_0, y_0) = (1,0)\) satisfies the first order conditions. We look at
This becomes
or
So \(\mu_1 = 1/2\). Also, \(g_1(1,0) = 1^2 + 0^2 - 1 = 0\) and \(g_(1,0) = 0\) as well, so the complementary slackness conditions are satisfied. Therefore \((1,0)\) satisfies the Kuhn-Tucker conditions, and so it is a candidate local minimizer. What about the second order conditions?
Clearly the second order necessary conditions are satisfied, but let's check the tangent space anyway. We have \(\nabla g_1(1,0) = (2, 0)\) and \(\nabla g_2(1,0) = (1,1)\); they are linearly independent, so the tangent space is a point. Therefore the second order necessary conditions are satisfied.
Thrm. Second Order Sufficient Conditions
Claim. Suppose \(\Omega \subseteq \mathbb R^n\) is open and \(f, h_1, \dots, h_k, g_1, \dots, g_l \in C^2(\Omega)\). Consider the minimization problem
Suppose \(x_0\) is a feasible point of the constraints. If the following three conditions are satisfied: 1. There exist \(\lambda_1, \dots, \lambda_k \in \mathbb R\) and \(\mu_1, \dots, \mu_l \geq 0\) such that \(\nabla f(x_0) + \sum_i \lambda_i \nabla h_i(x_0) + \sum_j \mu_j \nabla g_j(x_0) = 0\)
-
\(\mu_j g_j(x_0) = 0\) for each \(j\).
-
The matrix \(L(x_0) = \nabla^2 f(x_0) + \sum_i \lambda_i \nabla^2 h_i(x_0) + \sum_j \mu_j \nabla^2 g_j(x_0)\) is positive definite on the tangent space to the "strongly active constraints" at \(x_0\). That is, it is positive definite on the space \(\tilde{\tilde{T_{x_0}}} = \{ v \in \mathbb R^n : \nabla h_i(x_0) = 0 \forall i\land \forall k, 1 \leq k \leq l''\Rightarrow \nabla g_j(x_0) = 0\}\) where \(\{1, \dots, l''\}\) is the set of all indices of active constraints whose Lagrange multipliers are positive. \end{enumerate} then \(x_0\) is a strict local minimizer of \(f\).
proof. Suppose \(x_0\) is not a strict local minimizer of \(f\). We claim that there then exists a unit vector \(v \in \mathbb R^n\) such that 1. \(\nabla f(x_0) \cdot v \leq 0\). 2. \(\nabla h_i(x_0) \cdot v = 0\) for each \(i = 1, \dots, k\). 3. \(\nabla g_j(x_0) \cdot v \leq 0\) for all the active constraints (hereafter labelled by \(j = 1,\dots, l'\))
Since \(x_0\) is not a strict local minimizer, there exists a sequence \(x_k\) of feasible points unequal to \(x_0\) converging to \(x_0\) such that \(f(x_k) \leq f(x_0)\). Then \(f(x_k) - f(x_0) \leq 0\) for each \(k\). Let \(v_k = \frac{x_k-x_0}{\|x_k-x_0\|}\), and let \(s_k = \|x_k - x_0\|\). Then \(x_k = x_0 + s_kv_k\), with which we may rewrite the inequality as \(f(s_kv_k + x_0) - f(x_0) \leq 0\). Since each \(v_k \in S^1\), we may assume that the sequence \(v_k\) is convergent and that it converges to some \(v \in S^1\). We claim that this vector \(v\) has the three desired properties.
By Taylor's theorem we have
(The last equation is \(\leq 0\) because \(g_j(x_0) = 0\).) Divide everything by \(s_k\) and take the limit as \(k \to \infty\). Then
which proves the earlier claim.
We now claim that equality actually holds in (c). Suppose for the sake of contradiction that there is some \(1 \leq k \leq l'\) such that \(\nabla g_j(x_0) \cdot v < 0\) for some \(j\) for which \(g_j\) is strongly active at \(x_0\). By the first condition of the theorem,
and so the right hand side is strictly greater than zero, because we only considered strongly active constraints. This is a contradiction, so we conclude that \(\nabla g_j(x_0) = 0\) for all \(j\) such that \(g_j\) is strongly active at \(x_0\). Therefore \(v \in \tilde{\tilde{T_{x_0}}}\).
Again, by Taylor's theorem
Multiply the second line by \(\lambda_i\) and the third by \(\mu_j\) and add everything up to get
Divide everything by \(s_k^2\) to get
Taking the limit \(k \to \infty\) gives
which violates condition 3 of the theorem. We have a contradiction, so we conclude that \(x_0\) must be a strict local minimizer.
Example 1
Given \((a,b)\) with \(a,b > 0\) and \(a^2+b^2 > 1\). Consider the minimization problem:
Our intuition says that the minimizer should be \(\left( \frac{a}{\sqrt{a^2+b^2}}, \frac{b}{\sqrt{a^2+b^2}} \right)\). We have \(\nabla g(x,y) = (2x, 2y)\), so clearly all feasible points are regular. The Kuhn-Tucker conditions are
and \(\mu g(x,y) = 0\). That is,
Suppose \(\mu = 0\). Then \(x = a\) and \(y = b\); since we assumed \(a^2+b^2 > 1\), we would have that \((x,y)\) is not feasible. Therefore \(\mu \neq 0\), and so \(x^2+y^2 = 1\) by the third equation. Squaring the first two equations and adding them gives
implying that \(\mu = -1 + \sqrt{a^2+b^2}\) - we took the positive root because \(\mu > 0\). This is actually positive, since \(a^2+b^2 > 1\). Those first equations again give us
as expected. The Lagrangian is
which is everywhere positive definite. Therefore the second-order sufficient conditions are satisfied. For practice, however, let's compute the tangent space to the "strongly active constraints". The only constraint is \(g\); since \(g\) is active and its Lagrange multiplier \(\mu\) is positive, the constraint \(g\) is strongly active at \((x_0, y_0)\). Therefore the tangent space we are interested in is the tangent space to \(S^1\) at \((x_0, y_0)\): that space is \(\{v \in \mathbb R^2 : av_1 + bv_2 = 0\}\).
Example 2
Consider the problem
We have \(\nabla g(x,y) = (2(x+1), 2y)\), which makes it clear that every feasible point is regular. The Kuhn-Tucker conditions are
with \(\mu((x+1)^2 + y^2 - 1) = 0\) and \(\mu \geq 0\).
Consider \((x_0, y_0) = (0,0)\). The Kuhn-Tucker conditions imply \(\mu = 0\). In particular, \(g\) is active at \((0,0)\), but not strongly active there. The tangent space to the active constraint at \((0,0)\) is the y-axis. The Lagrangian at \((0,0)\) is
which is clearly positive definite on this tangent space. However, we cannot conclude anything, since the constraint \(g\) is not strongly active. In fact, it is clear that \((0,0)\) is not a local minimizer: for \(x<0\) sufficiently close to \(0\), \(f(x,0)\) is negative, yet it is \(0\) at \((0,0)\).