Examples: Unconstrained Finite Dimensional Optimization
Example 1
Question
Let \(a\in\mathbb R\) and \(f_a:\mathbb R^2\rightarrow \mathbb R, f_a(x, y)= x^2 + 2y^2 + axy - y\).
Part (a)
Find the points satisfy FOC.
The partial derivative gives
Set the derivatives to 0 to meet the FOC.
If \(a = 0\), then \(x = 0, y = 1/4\) satisfies FOC.
If \(a\neq 0\), then to make \(\begin{bmatrix}a&4\\2&a\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}1\\0\end{bmatrix}\) has solutions, the reduced row echelon form gives \(\begin{bmatrix}a&4\\0&\frac{a^2}2- 4\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}1\\-1\end{bmatrix}\), If \(a = \pm 2\sqrt{2}\), then there is no local minimum, otherwise, we have the unique solution \(y = \frac{2}{8-a^2}, x = \frac1a(1-\frac{8}{8-a^2}) = \frac{-a}{8-a^2}\).
To summarize,
If \(a = 0\), then \(x=0, y=1/4\) satisfies the FOC If \(a = \pm 2\sqrt 2\), then there is no points satisfy the FOC
If \(a\neq 0\) and \(a\neq \pm2\sqrt2\), then \(x = \frac{-a}{8-a^2}, y = \frac{2}{8-a^2}\) satisfies the FOC.
Part (b)
Find the points satisfy SOC
The Hessian matrix gives
Note that \(F_a\) is positive semidefinite iff \(\det(F_a) = 2\times 4-a^2 > 0\), so that for any \(a\in(-2\sqrt 2, 2\sqrt 2)\), the points satisfies the SOC.
Part (c)
Prove the local minimum is actually global minimum.
proof 1. Prove by completing the square
Note that \(f_a(x, y) = \frac12\begin{bmatrix}x\\y\end{bmatrix}\cdot \begin{bmatrix}2&a\\a&4\end{bmatrix}\begin{bmatrix}x\\y\end{bmatrix} - \begin{bmatrix}0\\1\end{bmatrix}\cdot\begin{bmatrix}x\\y\end{bmatrix}\)
Therefore, using completing the square method, let \(x^* = \begin{bmatrix}x^*\\y^*\end{bmatrix}\frac{1}{8-a^2}\begin{bmatrix}4&-a\\-a&2\end{bmatrix}\begin{bmatrix}0\\1\end{bmatrix} = \begin{bmatrix}\frac{-a}{8-a^2}\\\frac{2}{8-a^2}\end{bmatrix}\), then
Note that when \(a\in (-2\sqrt2, 2\sqrt 2)\), the matrix \(\begin{bmatrix}2&a\\a&4\end{bmatrix}\) is positive-semidefinite, i.e. for any \(\begin{bmatrix}x\\y\end{bmatrix}\), we have \(\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}\cdot \begin{bmatrix}2&a\\a&4\end{bmatrix}\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}\geq 0\). Therefore, the minimum can only be reached when \(\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix} = 0 \Rightarrow \begin{bmatrix}x\\y\end{bmatrix} = \begin{bmatrix}x^*\\y^*\end{bmatrix} = \begin{bmatrix}\frac{-a}{8-a^2}\\\frac{2}{8-a^2}\end{bmatrix}\)
proof 2. Prove by convexity
Let \(\vec x_1 = (x_1, y_1)\in \mathbb R^2, \vec x_2 = (x_2, y_2) \in \mathbb R^2\), let \(c\in [0, 1]\).
For \(c = 1\), obviously \(f_a(1x_1 +0x_2) = 1f_a(x_1) + 0 f_a(x_2)\), similarly for \(c=0\).
For \(c\in(0,1)\), denote \(Q = \begin{bmatrix}2&a\\a&4\end{bmatrix}\)
Therefore,
Note that when \(a\in(-2\sqrt 2, 2\sqrt 2)\), \(Q\) is positive semidefinite, hence \(\frac12c(1-c)(\vec x_1-\vec x_2)^TQ(\vec x_1-\vec x_2) \geq 0\). Therefore, \(cf_a(\vec x_1) + (1-c)f_a(\vec x_2) \geq f_a(c_x1+(1-c)x_2)\). By the definition of convex function, \(f_a\) is convex, and the local minimum is the global minimum.
Example 2
Question
Find the local minimum point(s) for \(f(x, y,z ) = (x-\frac y2)^2 + \frac34(y-2)^2 + z^2 -3\) on \(S = \{(x,y,z)\in\mathbb R^3 \mid x\leq 0, y \geq 0\}\), and prove the local minimum is actually a global minimum.
First of all, let \(g(x, y) = f(x, y, 0) = (x-\frac y2)^2 + \frac34(y-2)^2-3\). Obsever that \(\forall z\in\mathbb R. z^2 \geq 0\), hence minimizes \(f\) is equivalent to minimize \(g\) as having \(z=0\).
First, the partial derivative
- \(x < 0, y > 0\)
solves the system of equation that \(\nabla f = \vec0\),
does not lie in the set, there is no local min in the interior of \(S\)
- \(x = 0, y > 0\)
Let the feasible direction be \(v = (v_1, v_2), v_1 \leq 0\) and want
Note that for any \(v_1 \leq 0, -yv_1 \geq 0\) so that the condition is equivalent to have \(2y-3 = 0\Rightarrow y = \frac32\), and the candidate is \((0, \frac32)\)
- \(x < 0, y = 0\) Let the feasible direction be \(v = (v_1, v_2), v_2 \geq 0\) and want
Note that for direction \((1/2, 1), \nabla f\cdot v = 2x\cdot\frac12 - (x + 3) = -3 < 0\) for any \(x\), hence there is no local min in this case.
- \(x, y = 0\) Let the feasible direction be \(v = (v_1, v_2), v_1 \leq 0, v_2 \geq 0\) and want
does not hold for \(v_2 \geq 0\), hence no local minimum.
Example 3
Question
Show that \(xx^T\), where \(x\in\mathbb R^n\), is positive semidefinite.
Let \(x\in\mathbb R^n, a\in\mathbb R^n\).
Therefore, by definition of positive semidefinite, \(xx^T\) is positive semidefinite
Example 4
Question
Let \(f(x) = b\cdot Ax\), \(A\) is \(n\times m\) matrix, \(x\in\mathbb R^m, b\in\mathbb R^n\), show that \(\nabla f(x)=A^Tb\).
First, note that
Therefore, for each component \(x_i\), we can easily derive the partial derivative as
and so that \(\nabla f = \begin{bmatrix}\sum_{j=1}^n b_j A_{j1}\\\vdots\\\sum_{j=1}^n b_j A_{jm}\end{bmatrix} = A^Tb\)
Part (b)
Let \(f(x) = x\cdot Ax\), show that \(\nabla f(x) = (A+A^T)x\).
Example 5
Question
Let \(f:\mathbb R^{2n} \rightarrow \mathbb R, f(x, y) = \frac12|Ax-By|^2\), where \(A,B\) are \(m\times n\) matrices, \(x, y\in\mathbb R^n\)
Part (a)
Find \(\nabla f, \nabla^2f\)
Note that \(f(x, y) = \frac12 (Ax-By)^T(Ax-By)\), let \(g(x, y) = Ax-By, h(a) = a^Ta\) so that \(f(x, y) = \frac12h(g(x, y))\). Therefore, using chain rule,
Note that \(h(a) = a^T a = a^TI a\) where \(I\) is the identity matrix, so that by Question 4 Part (b), \(Dh(a) = (I+I^T)a = 2a\).
Then, note that \(\frac{\partial g}{\partial x} = A, \frac{\partial g}{\partial y} = -B\), hence \(Df(x, y) = \begin{bmatrix}[A]&[-B]\end{bmatrix}\), i.e. matrix \(A, -B\) stacked horizontally.
Therefore,
Then, note that \((Ax-By)^T\begin{bmatrix}[A]& [-B]\end{bmatrix} = \begin{bmatrix}[A]& [-B]\end{bmatrix}^T Ax - \begin{bmatrix}[A]& [-B]\end{bmatrix}^TBy\). Therefore,
Rewrite into matrix form,
Part (b)
If \((x_0, y_0)\) satisfies \(Ax_0 = By_0\), then \((x_0, y_0)\) is a local minimum.
Note that \(\nabla f(x_0, y_0) = \begin{bmatrix}[A]& [B]\end{bmatrix}^T(Ax_0-By_0) = \begin{bmatrix}[A]& [B]\end{bmatrix}^T0 = 0\) which satisfies FOC.
Also, note that the Hessian matrix can be rewrite as
so that it is a positive semidefinite matrix, which satisfy the SOC.
Example 6
Question
Let \(g\) be a convex function on \(\mathbb R^n\), \(f\) be a linear, nondecreasing function on a single variable.
Part (a)
Prove \(F:=f\circ g\) is convex.
By convexity of \(g\),
By non-decreasing of \(f\)
By linearity of \(f\)
By the definition of convex, the claim is proven.
Part (b)
\(\nabla^2F(x)\) is positive semidefinite.
Rewrite the derivatives with the matrix multiplication notation
Because \(f\) is linear, \(d^2f(g(x)) = 0\)
Because \(f\) is non-decreasing, \(df(g(x)) \geq 0\)
Because \(g\) is convex, \(\nabla^2 g(x)\) is positive semidefinite
Also, note that \(\nabla g(x) \nabla g(x)^T\) is positive semidefinite Therefore, a positive semidefinite matrix scaled by a positive number is still positive semidefinite.
Example 7
Question
\(f:\mathbb R^2\rightarrow \mathbb R\) is non-negative convex function, \(F:\mathcal A\rightarrow \mathbb R, F(\mu) = \int_0^1f(\mu(x), \mu'(x))dx\) where \(\mathcal A = \{\mu\in C^1: [0,1]\rightarrow\mathbb R\}\). Prove \(F\) is convex on \(\mathcal A\).
Let \(a\in (0, 1), u_1, u_2\in \mathcal A\).
Using chain rule
Note that for any \(x\in [0, 1]\), by convexity of \(f\)
Because \(f\) is non-negative
Example 8
Question
If \(f:\Omega\rightarrow \mathbb R\) is covex on \(\Omega=(a,b)\), then \(f\) is also continuous.
lemma If \(f:\Omega\rightarrow\mathbb R\) is convex, then \(\forall x_1, x, x_2 \in \Omega, x_1\leq x\leq x_2. \frac{f(x) - f(x_1)}{x-x_1} \leq \frac{f(x_2) - f(x_1)}{x_2-x_1} \leq \frac{f(x_2) - f(x)}{x_2-x}\).
proof. Let \(x_1, x, x_2\in \Omega. x_1 < x < x_2\), note that \(\frac{x_2 - x}{x_2 - x_1} \in [0, 1]\) Since \(f\) is convex,
Then, the inequalities can be easily derived as
Similar derivation holds for
Claim If \(f\) is convex, then \(\forall x_0\in (a, b), \lim_{x\rightarrow x_0} f(x) = f(x_0)\) (\(f\) is continuous using the limit definition).
proof. Let \(c, d \in (a, b), a<c<x_0 < d<b\).
Take functions \(l_1(x) = \frac{f(x_0) - f(c)}{x_0 - c}(x-x_0) + f(x_0), l_2(x) = \frac{f(d) - f(x_0)}{d - x_0}(x-x_0) + f(x_0)\), where \(l_1\) is the line pass through \((c, x_0)\) and \(l_2\) is the line pass through \((x_0, d)\).
Then, for any \(x\in (x_0, d)\), since \(f\) is convex and use our lemma above, we have
so that
Since \(\forall x\in (x_0, d), l_1(x)\leq f(x) \leq l_2(x)\) and \(\lim_{x\rightarrow x_0+}l_1(x) = l_1(x_0) = f(x_0) = l_2(x_0) = \lim_{x\rightarrow x_0+}l_2(x)\), by squeeze theorem
With the similar arguments, we can show that \(\forall x\in (c, x_0), l_2(x) \leq f(x) \leq l_1(x)\), and use squeeze theorem,
Finally, the two limits from both sides conclude that
Therefore, we have shown that \(\forall x\in (a, b), \lim_{x\rightarrow x_0}f(x) = f(x_0)\), which means \(f\) is continuous on \((a,b)\)
Example 9
Question
If \(f:\Omega\rightarrow \mathbb R\) is continuous and convex and exists some maximum on the interior of \(\Omega\), then \(f\) is a constant function.
proof. Let \(x_0 \in \Omega_{int}\) where \(f(x_0)\) is the maximum.
Assume \(f\) is not constant. Take \(x_1 \in \Omega\) s.t. \(f(x_1) < f(x_0)\).
Since \(x_0\) is an interior point, take some \(t\in(0, 1)\) s.t. \(x_2 = x_0 - t(x_1 - x_0), x_2 \in B(x_0, \epsilon)\subset \Omega\) for some \(\epsilon > 0\).
Then, \(x_2, x_0, x_1\) forms a line and \(x_0 = \frac{t}{1+t}x_1 + \frac{1}{1+t}x_2\).
By our assumption, \(f(x_1) < f(x_0), f(x_2) \leq f(x_0)\), hence exists \(c = \frac{1}{t} \in (0, 1)\)
This contradicts with the fact that \(f\) is convex, by contradiction, \(f\) must be a constant function.
Example 10
Question
Let \(f: \Omega\rightarrow \mathbb R, f(x): a\cdot x + b\) where \(\Omega\) is compact and convex subset of \(\mathbb R^n\).
Part (a)
If \(a\neq 0\), then any minimizer of \(f\) must be on \(\partial \Omega\).
proof. Suppose exists some minimizer \(x_0 \in \Omega_{int}\), then take some \(t > 0\) s.t. \(x = x_0 - ta \in B(x_0, \epsilon) \subset \Omega\)
If \(a\neq 0\), then \(t\|a\|^2 > 0, f(x) < f(x_0)\), \(x_0\) is not a minimizer. By contradiction, the minimizer must be on \(\partial \Omega\).
Part (b)
Suppose \(g(x) = \|x\|^2 + f(x)\), under what condition of \(a\) can you guarantee that the minimizers do not occur in the interior of the set \(\Omega\)?
Note that \(\nabla g(x) = 2x + a\). Note that a point \(x_*\) is not minimizer means that exists a feasible direction \(d \in \mathbb R^n\) s.t. \(\nabla g(x_*) < 0\).
Because \(\|x\|^2 + f(x)\) is continuous on \(\mathbb R^n\), for some interior point \(x_*\), all directions are feasible, therefore \(x_*\) is not a minimizer implies that \(\nabla g(x_*) =2x_* + a \neq 0\).
Therefore, to guarantee that any interior point is not a minimizer, we want \(a\) to satisfy that \(\forall x\in\Omega_{int}, 2x +a \neq 0\)
Example 11
Question
If \(f(x):\mathbb R^n\rightarrow\mathbb R\) is convex, then \(g(x, z) : \mathbb R^n \times \mathbb R \rightarrow \mathbb R, g(x, z) := f(x) + \|x+z\|^2\) is also convex.
proof. Let \(x_1, z_1, x_2, z_2\in \mathbb R^n, c\in [0, 1]\), consider
By definition of convexity, \(g\) is convex
Example 12
Question
For \(f: \mathbb R^n \rightarrow \mathbb R\) be \(C^1\) function, define \(M = \{(x, f(x))\in \mathbb R^{n+1}\}\), given \(p = (x_0, f(x_0)) \in M\), find the tangent space \(T_pM\).
Define \(g(x, z) = f(x) - z\), note that \(\nabla g(x, z) = [\nabla f(x), -1]\in\mathbb R^{n}\times \mathbb R\).
Then, note that \(g(p) = 0\) and the equation of the tangent plane where \(p\) is on the plane is given as
Therefore, the tangent space is given as