Unconstrained Finite Dimensional Optimization
Problem Definition
Our main problem is
where \(\Omega\) is one of the following three types: - \(\Omega = \mathbb R^n\). - \(\Omega\) open. - \(\Omega\) the closure of an open set.
We can consider minimization problems without any loss of generality, since any maximization problem can be converted to a minimization problem by taking the negative of the function in question: that is,
Defn. Feasible Direction
Given \(\Omega \subseteq \mathbb R^n\) and a point \(x_0 \in \Omega\), we say that the vector \(v \in \mathbb R^n\) is a feasible direction at \(x_0\) if there is an \(\overline{s} > 0\) such that \(x_0 + sv \in \Omega\) for all \(s \in [0, \overline{s}]\).
Thrm. First order necessary condition for a local minimum (FONC)
Claim. If \(f : \mathbb R^n \supseteq \Omega \to \mathbb R\) is \(C^1\) and \(x_0\) is a local minimizer of \(f\) in the interior of \(\Omega\), then \(\nabla f(x_0) = 0\).
proof. If \(x_0\) is an interior point of \(\Omega\), then all directions at \(x_0\) are feasible. In particular, for any such \(v\), we have \(\nabla f(x_0) \cdot (v) \geq 0\) and \(\nabla f(x_0) \cdot (-v) \geq 0\), which implies \(\nabla f(x_0) = 0\) as all directions are feasible at \(x_0\).
Claim. Let \(f : \mathbb R^n \supseteq \Omega \to \mathbb R\) be \(C^1\). If \(x_0 \in \Omega\) is a local minimizer of \(f\), then \(\nabla f(x_0) \cdot v \geq 0\) for all feasible directions \(v\) at \(x_0\).
proof. Reduce to a single-variable problem by defining \(g(s) = f(x_0 + sv)\), where \(s \geq 0\). Then \(0\) is a local minimizer of \(g\). Taylor's theorem gives us [ g(s) - g(0) = s g'(0) + o(s) = s \nabla f(x_0) \cdot v + o(s). ] If \(\nabla f(x_0) \cdot v < 0\), then for sufficiently small \(s\) the right side is negative. This implies that \(g(s) < g(0)\) for those \(s\), a contradiction. Therefore \(\nabla f(x_0) \cdot v \geq 0\).
Example 1
Consider the optimization problem
By the corollary to the FONC, we want to find the points \((x_0, y_0)\) where \(\nabla f(x_0, y_0) = 0\). We have
so we want to solve
which has solution \((x_0, y_0) = (1,2)\). Therefore \((1,2)\) is the only \emph{candidate} for a local minimizer. That is, if the function \(f\) has a local minimizer in \(\mathbb R^2\), then it must be \((1,2)\).
It turns out that \((1,2)\) is a global minimizer for \(f\) on \(\Omega = \mathbb R^2\). By some work, we have [ f(x,y) = \left(x - \frac{y}{2}\mathbb Right)^2 + \frac{3}{4}(y-2)^2 - 3. ] In this form, it is obvious that a global minimizer occurs at the point where the squared terms are zero, if such a point exists. That point is \((1,2)\).
Example 2
Consider the problem
We have
To apply the FONC, we'll divide the feasible set \(\Omega\) into four different regions.
Suppose that \((x_0, y_0)\) is a local minimizer of \(f\) on \(\Omega\). - \((x_0, y_0)\) is an interior point:
By the corollary to the FONC, we must have \(\nabla f(x_0, y_0) = 0\). Then \(x_0 = -1\), which is not in the interior of \(\Omega\). This case fails.
- \((x_0, y_0)\) on the positive x-axis:
Then we are considering \((x_0, 0)\). The feasible directions at \((x_0, 0)\) are those vectors \(v \in \mathbb R^2\) with \(v_2 \geq 0\). The FONC tells us that \(\nabla f(x_0,0) \cdot v \geq 0\) for all feasible directions \(v\). We then have \((2x_0 - 1)v_1 + (x_0 + 1)v_2 \geq 0\) for all \(v_1\) and all \(v_2 \geq 0\). In particular, this holds for \(v_2 = 0\), so \((2x_0 - 1)v_1 \geq 0\) for all \(v_1\), implying \(x_0 = 1/2\). Therefore \((1/2, 0)\) is a candidate for a local minimizer of \(f\) on \(\Omega\) - this is the only candidate for a local minimizer of \(f\) on the positive \(x\)-axis. - \((x_0, y_0)\) on the positive y-axis:
Then we are considering \((0, y_0)\). The feasible directions here are \(v \in \mathbb R^2\) with \(v_1 \geq 0\). Then we have \((y_0 - 1)v_1 + v_2 \geq 0\) for any \(v_2\) and \(v_1 \geq 0\). This is a contradiction if we take \(v_1 = 0\), so \(f\) has no local minimizers along the positive \(y\)-axis. - \((x_0, y_0)\) is the origin:
Then we are considering \((0,0)\). The feasible directions here are \(v \in \mathbb R^2\) with \(v_1, v_2 \geq 0\). Then we have \(-v_1 + v_2 \geq 0\) for all \(v_1, v_2 \geq 0\), a contradiction. Therefore the origin is not a local minimizer of \(f\).
We conclude that the only candidate for a local minimizer of \(f\) is \((1/2, 0)\). It turns out that this is actually a global minimizer of \(f\) on \(\Omega\). (This is to be seen.)
Thrm. Second order necessary condition for a local minimum (SONC)
Claim. Let \(f : \mathbb R^n \supseteq \Omega \to \mathbb R\) be \(C^2\). If \(x_0 \in \Omega\) is a local minimizer of \(f\), then for any feasible direction \(v\) at \(x_0\) the following conditions hold: - \(\nabla f(x_0) \cdot v \geq 0\). - If \(\nabla f(x_0) \cdot v = 0\), then \(v^T \nabla^2 f(x_0) v \geq 0\).
proof. Fix a feasible direction \(v\) at \(x_0\). Then \(f(x_0) \leq f(x_0 + sv)\) for sufficiently small \(s\). By Taylor's theorem,
so by the FONC,
If \(v^T \nabla^2 f(x_0) v < 0\), then for sufficiently small \(s\) the right side is negative, implying that \(f(x_0 + sv) < f(x_0)\) for such \(s\), which contradicts local minimality of \(f(x_0)\). Therefore \(v^T \nabla^2 f(x_0) \geq 0\).
Corollary. If \(f : \mathbb R^n \supseteq \Omega \to \mathbb R\) is \(C^2\) and \(x_0\) is a local minimizer of \(f\) in the interior of \(\Omega\), then \(\nabla f(x_0) = 0\) and \(\nabla^2 f(x_0)\) is positive semidefinite.
Defn. Principal Minor
A principal minor of a square matrix \(A\) is the determinant of a submatrix of \(A\) obtained by removing any \(k\) rows and the corresponding \(k\) columns, \(k \geq 0\). A leading principal minor of \(A\) is the determinant of a submatrix obtained by removing the last \(k\) rows and \(k\) columns of \(A\), \(k \geq 0\).
Thrm. Sylvester's criterion
- For positive definite self-adjoint matrices If \(A\) is a self-adjoint matrix, then \(A \succ 0\) if and only if all of the leading principal minors of \(A\) are positive.
- For positive semidefinite self-adjoint matrices If \(A\) is a self-adjoint matrix, then \(A \succeq 0\) if and only if all of the principal minors of \(A\) are non-negative.
Example 1
Consider the problem
Recall that \((1,2)\) was the only candidate for a local minimizer of \(f\) on \(\Omega\). We now check that the SONC holds. Since \((1,2)\) is an interior point of \(\Omega\), we must have \(\nabla^2 f(1,2) \succeq 0\). We have
All of the leading principal minors of \(\nabla^2 f(1,2)\) are positive, so \((1,2)\) satisfies the SONC by Sylvester's criterion.
Example 2
Consider the problem
Recall that \((1/2, 0)\) was the only candidate for a local minizer of \(f\). We have
To satisfy the SONC, we must have \(v^T \nabla^2 f(1/2, 0) v \geq 0\) for all feasible directions \(v\) at \((1/2, 0)\) such that \(\nabla f(1/2, 0) \cdot v = 0\). We have \(\nabla f(1/2, 0) = (0, 3/2)\) so if \(v = (v_1, 0)\), then \(v\) is a feasible direction at \((1/2, 0)\) with \(\nabla f(1,2, 0) \cdot v = 0\). Then
So the SONC is satisfied.
Completing the Square
Let \(A\) be a symmetric positive definite \(n \times n\) matrix. Our problem is
The FONC tells us that if \(x_0\) is a local minimizer of \(f\), then since \(x_0\) is an interior point, \(\nabla f(x_0) = 0\). We thus have \(Ax_0 = b\), so since \(A\) is invertible (positive eigenvalues), \(x_0 = A^{-1}b\). Therefore \(x_0 = A^{-1}b\) is the unique candidate for a local minimizer of \(f\) on \(\Omega\).
The SONC then tells us that \(\nabla^2 f(x_0) = A\), so that \(\nabla^2 f(x_0) \succ 0\), implying that \(x_0 = A^{-1}b\) is a candidate for a local minimizer of \(f\) on \(\Omega\).
In fact, the candidate \(x_0\) is a global minimizer. Why? We will "complete the square". We can write
this relies on symmetry. (Long rearranging of terms.) In this form it is obvious that \(x_0\) is a global minimizer of \(f\) over \(\Omega\).
Thrm. Second order sufficient conditions for interior local minimizers
Lemma If \(A\) is symmetric and positive-definite, then there is an \(a > 0\) such that \(v^T A v \geq a \|v\|^2\) for all \(v\).
proof. There is an orthogonal matrix \(Q\) with \(Q^T A Q = \mathrm{diag}(\lambda_1, \dots, \lambda_n)\). If \(v = Qw\),
Since \(A\) is positive-definite, every eigenvalue is positive and we are done.
Claim. Let \(f\) be \(C^2\) on \(\Omega \subseteq \mathbb R^n\), and let \(x_0\) be an interior point of \(\Omega\) such that \(\nabla f(x_0) = 0\) and \(\nabla^2 f(x_0) \succ 0\). Then \(x_0\) is a strict local minimizer of \(f\).
proof. The condition \(\nabla^2 f(x_0) \succ 0\) implies there is an \(a > 0\) such that \(v^T \nabla^2 f(x_0) v \geq a \cdot \|v\|^2\) for all \(v\). By Taylor's theorem we have
For sufficiently small \(v\) the right hand side is positive, so \(f(x_0 + v) > f(x_0)\) for all such \(v\). Therefore \(x_0\) is a strict local minimizer of \(f\) on \(\Omega\).
Example 1
Consider \(f(x,y) = xy\). The gradient is \(\nabla f(x,y) = (y,x)\) and the Hessian is
Suppose we want to minimize \(f\) on all of \(\Omega = \mathbb R^2\). By the FONC, the only candidate for a local minimizer is \((0,0)\). The Hessian's eigenvalues are \(\pm 1\), so it is not positive definite. We conclude by the SONC that the origin is not a local minimizer of \(f\).
Example 2
Consider the same function \(f(x,y) = xy\) on \(\Omega = \{(x,y) \in \mathbb R^2, x, y \geq 0\}\). We claim that every point of the boundary of \(\Omega\) is a local minimizer of \(f\).
Consider \((x,0)\) with \(x > 0\). The feasible directions here are \(v\) with \(v_2 \geq 0\). The FONC tells us that \(\nabla f(x,0) \cdot v\geq 0\). This dot product is \(xv_2 \geq 0\), so \((x,0)\) satisfies the FONC. Therefore every point on the positive x-axis is a candidate for a local minimizer. As for the SONC, \(\nabla f(x,0) \cdot v = xv_2 = 0\) if and only if \(v_2 = 0\). Then \(v^T \nabla^2 f(x,0) v = 0\). Of course, this tells us nothing; we need a sufficient condition that works for boundary points. That's for next lecture.
Or, you could just say that \(f = 0\) on the boundary of \(\Omega\) and is positive on the interior, so every point of the boundary of \(\Omega\) is a local minimizer (not strict) of \(f\).