Examples: Conjugate Directions and Conjugate Gradients
Example 1
Question
Let \(c\in\mathbb R^n - \{0\}, f(x) = \frac12x^TQx-b^Tx, Q = I +cc^T\). Using conjugate gradient method, what's the smallest \(k\) that guarantees \(x_k\) is the minimizer of \(f\).
Claim. \(k=2\)
proof. First, consider some eigenvalues \(\lambda\) and corresponding eigenvector \(x\), by definition, we have
\[\begin{align*} \lambda x &= Qx\\ \lambda x &= (I + cc^T)x\\ \lambda x &= x + cc^Tx\\ (\lambda - 1)x &= (c^Tx)c &\text{Note that }c^Tx\in\mathbb R \end{align*}\]
If \(\lambda - 1 = 0\), we must have \(c^Tx = 0\), so that \(\lambda = 1\) is a eigenvalue, If \(\lambda - 1 \neq 0\), then \(x\) and \(c\) are linearly dependent, hence the eigenvector is \(c\) and we have
\[\begin{align*} \lambda c &= (I + cc^T)c\\ \lambda c &= c + cc^Tc\\ \lambda &= 1 + c^Tc \end{align*}\]
Therefore, there are only 2 distinct eigenvalues for \(Q = I + cc^T\).
Then, we can take \(P_1(\lambda) = a + b\lambda\) be a polynomial of degree 1 s.t.
Prove that the Gram-Schmidt procedure generate a sequence of Q-conjugate directions given a linear independent set of vector \(p_0,...,p_{n-1}\in\mathbb R^n\).
proof. Note that this statement is equal to say that \(\forall k\in\{0,...,n-1\}. \forall j < k. d_k^TQd_j = 0\), and I'll prove this statement by strong induction.
First, since \(Q\) is symmetric \(d_k^TQd_j = d_j^TQd_k\) for any \(d_j,d_k\). Also note that since \(d_k\)'s are linear combinations of \(p_0, ..., p_{n-1}\), \(d_k\neq 0, \forall k\in\{0,...,n-1\}\), and since \(Q\) is positive definite \(d_K^TQd_k > 0\).
Then, note that for \(k = 0\), the statement is vacuously true. Fir \(k \in \{1, ..., n-1\}\), assume that \(\forall m < k. \forall j < m. d_m^TQd_j = d_j^TQd_m = 0\), i.e. \(\forall j,m < k, j\neq m. d^T_mQd_j = 0\) Then, for some \(i < k\), we have
Let \(Q\) be positive definite, \(f(x) = \frac12 x^TQx - b^Tx\). Let \(x_1=\arg\min_{x\in S_1}f(x), x_2=\arg\min_{x\in S_2}f(x)\), where \(S_1,S_2\subset E^n\) and \(d\in S_1\cap S_2\), assume \(f(x_1) < f(x_2)\). Show that \((x_1-x_2)^TQd = 0\).
proof. Since \(x_1\) is a minimizer of \(S_1\), we must have \(\nabla f(x_1)^T = 0\), otherwise we can have some \(\epsilon > 0, f(x_1 - \epsilon d) < f(x_1)\) and \(x_1-\epsilon d \in S_1\) since \(x_1\in S_1, d\in S_1\). Note that the equation is expanded as
Let \(f = \frac12x^TQx - b^Tx\) where \(Q = diag(\lambda_1,...,\lambda_n)\) being a diagonal, positive definite and symmetric matrix.
Part (a)
Show that standard basis vectors form a Q-orthogonal set.
proof. Let \(i, j \in\{1, ...,n\}, i\neq j\). Denote \(e_{mk}\) be the \(k\)th entry of \(e_m\), \(Q_{ij}\) be the entry on \(i\)th row and \(j\)th column of \(Q\)
Note that \(Q_{pp} = \lambda_i, Q_{pq}=0\), for any \(p,q\in\{1,...,n\}, p\neq q\). \(e_{ii}=1, e_{ip}=0\) for \(p\in\{1,...,n\}-\{i\}\) \(e_{jj}=1, e_{jq}=0\) for \(q\in\{1,...,n\}-\{j\}\) Therefore, consider each term of the summation, when \(p\neq q, Q_{pq}=0\), when \(p=q\), at least one of \(e_{ip},e_{jq}\) equals 0. Therefore, all terms in the summation are 0, \(e_i^TQe_j = 0\), hence \(\{d_0,...,d_{n-1}\} = \{e_1,...,e_n\}\) forms a Q-orthogonal set.
Part (b)
Prove \(x_k = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1},...,a_n)\) is the \(k\)th step of Conjugate direction method, starting from \(x_0 = (a_1,...,a_n)\).
proof. I'll prove by induction. Let \(k\in \{1,...,n-2\}\), assume \(x_{k} = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1},...,a_n)\). Consider the \((k+1)\)th step of conjugate direction method.
Prove that \(\forall k \geq 1, x_k\) is the minimizer of \(f\) in the set \(x_0 + \mathcal B_k, \mathcal B_k = span\{d_0,...,d_{k-1}\} = span\{e_1,...,e_k\}\).
Then, note that \(\frac{\partial^{2}\phi}{\partial y_i^2} = \lambda_i, \frac{\partial^{2}\phi}{\partial y_i y_j} = 0\) for \(i,j\in\{1,...,k\}, i\neq j\), we have \(\nabla^2\phi = diag(\lambda_1,...,\lambda_k)\), i.e. the top-left \(k\times k\) submatrix of \(Q\), since \(Q\) is positive definite, \(\nabla^2\phi\) is also positive definite, SOC also holds and
For \(x^*\in\mathbb R^2, a\in\mathbb R\), for \(x\) on the line \(L = \{x\in\mathbb R^2\mid x=(t,at), t\in\mathbb R\}\), find \(x\) that minimizes \(d(x,x^*)\).
Note that \(\nabla f = Q\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}, \nabla l = \begin{bmatrix}a\\-1\end{bmatrix}\) using Lagrange multiplier, we have equations