Examples: Conjugate Directions and Conjugate Gradients Example 1 Question
Let c ∈ R n − { 0 } , f ( x ) = 1 2 x T Q x − b T x , Q = I + c c T c\in\mathbb R^n - \{0\}, f(x) = \frac12x^TQx-b^Tx, Q = I +cc^T c ∈ R n − { 0 } , f ( x ) = 2 1 x T Q x − b T x , Q = I + c c T . Using conjugate gradient method, what's the smallest k k k that guarantees x k x_k x k is the minimizer of f f f .
Claim . k = 2 k=2 k = 2
proof . First, consider some eigenvalues λ \lambda λ and corresponding eigenvector x x x , by definition, we have
λ x = Q x λ x = ( I + c c T ) x λ x = x + c c T x ( λ − 1 ) x = ( c T x ) c Note that c T x ∈ R \begin{align*} \lambda x &= Qx\\ \lambda x &= (I + cc^T)x\\ \lambda x &= x + cc^Tx\\ (\lambda - 1)x &= (c^Tx)c &\text{Note that }c^Tx\in\mathbb R \end{align*} λ x λ x λ x ( λ − 1 ) x = Q x = ( I + c c T ) x = x + c c T x = ( c T x ) c Note that c T x ∈ R If λ − 1 = 0 \lambda - 1 = 0 λ − 1 = 0 , we must have c T x = 0 c^Tx = 0 c T x = 0 , so that λ = 1 \lambda = 1 λ = 1 is a eigenvalue, If λ − 1 ≠ 0 \lambda - 1 \neq 0 λ − 1 = 0 , then x x x and c c c are linearly dependent, hence the eigenvector is c c c and we have
λ c = ( I + c c T ) c λ c = c + c c T c λ = 1 + c T c \begin{align*} \lambda c &= (I + cc^T)c\\ \lambda c &= c + cc^Tc\\ \lambda &= 1 + c^Tc \end{align*} λ c λ c λ = ( I + c c T ) c = c + c c T c = 1 + c T c Therefore, there are only 2 distinct eigenvalues for Q = I + c c T Q = I + cc^T Q = I + c c T .
Then, we can take P 1 ( λ ) = a + b λ P_1(\lambda) = a + b\lambda P 1 ( λ ) = a + bλ be a polynomial of degree 1 s.t.
[ 1 1 1 1 + c T c ] [ a b ] = [ − 1 − 1 1 + c T c ] \begin{bmatrix} 1&1\\ 1&1+c^Tc \end{bmatrix}\begin{bmatrix} a\\b \end{bmatrix}=\begin{bmatrix} -1\\-\frac{1}{1+c^Tc} \end{bmatrix} [ 1 1 1 1 + c T c ] [ a b ] = [ − 1 − 1 + c T c 1 ] so that 1 + P ( 1 ) = 0 1+P(1) = 0 1 + P ( 1 ) = 0 and 1 + ( 1 + c c T ) P ( 1 + c c T ) = 0 1+(1+cc^T)P(1+cc^T) =0 1 + ( 1 + c c T ) P ( 1 + c c T ) = 0 . Therefore, we have
q ( x 2 ) ≤ max λ ∈ { 1 , 1 + c T c } ( 1 + λ P 1 ( λ ) ) q ( x 0 ) = 0 q(x_2) \leq \max_{\lambda \in \{1, 1+c^Tc\}}(1+\lambda P_1(\lambda))q(x_0) = 0 q ( x 2 ) ≤ λ ∈ { 1 , 1 + c T c } max ( 1 + λ P 1 ( λ )) q ( x 0 ) = 0 Therefore, k = 2 k = 2 k = 2 guarantees x 2 x_2 x 2 is the minimizer of f f f .
Example 2 Question
Let Q = [ 2 − 5 − 5 2 ] , b = [ 25 8 ] , f = 1 2 x T Q x − b T x Q = \begin{bmatrix}2&-5\\-5&2\end{bmatrix}, b = \begin{bmatrix}25&8\end{bmatrix}, f = \frac{1}{2}x^TQx - b^Tx Q = [ 2 − 5 − 5 2 ] , b = [ 25 8 ] , f = 2 1 x T Q x − b T x .
Part (a)
Find eigenvalues λ 0 ≤ λ 1 \lambda_0\leq \lambda_1 λ 0 ≤ λ 1 of Q Q Q and corresponding eigenvectors.
First, find its characteristic polynomial as
( 2 − λ ) 2 − 25 = λ 2 − 4 λ + 4 − 25 = λ 2 − 4 λ + 21 = ( λ − 7 ) ( λ + 3 ) (2-\lambda)^2 - 25 = \lambda^2 - 4\lambda + 4 -25 = \lambda^2 - 4\lambda+21=(\lambda-7)(\lambda+3) ( 2 − λ ) 2 − 25 = λ 2 − 4 λ + 4 − 25 = λ 2 − 4 λ + 21 = ( λ − 7 ) ( λ + 3 ) set the equation to 0 0 0 and solve to get
λ 0 = − 3 , λ 1 = 7 \lambda_0 = -3, \lambda_1 = 7 λ 0 = − 3 , λ 1 = 7 Then,
( Q − λ 0 I ) d 0 = 0 [ 5 − 5 − 5 5 ] d 0 = 0 d 0 = [ 1 1 ] ( Q − λ 1 I ) d 1 = 0 [ − 5 − 5 − 5 − 5 ] d 1 = 0 d 1 = [ 1 − 1 ] \begin{align*} (Q - \lambda_0I)d_0 &= 0\\ \begin{bmatrix}5&-5\\-5&5\end{bmatrix}d_0 &= 0\\ d_0&= \begin{bmatrix}1\\1\end{bmatrix}\\ (Q - \lambda_1I)d_1 &= 0\\ \begin{bmatrix}-5&-5\\-5&-5\end{bmatrix}d_1 &= 0\\ d_1&= \begin{bmatrix}1\\-1\end{bmatrix} \end{align*} ( Q − λ 0 I ) d 0 [ 5 − 5 − 5 5 ] d 0 d 0 ( Q − λ 1 I ) d 1 [ − 5 − 5 − 5 − 5 ] d 1 d 1 = 0 = 0 = [ 1 1 ] = 0 = 0 = [ 1 − 1 ] Part (b)
Compute the steps of conjugate directions method given directions d 0 , d 1 d_0, d_1 d 0 , d 1 .
g 0 = [ 2 − 5 − 5 2 ] [ 25 5 ] − [ 25 8 ] = [ 0 − 123 ] a 0 = − ( [ 0 − 123 ] T [ 1 1 ] ) / ( [ 1 1 ] T [ 2 − 5 − 5 2 ] [ 1 1 ] ) = − 41 2 x 1 = [ 25 5 ] − 41 2 [ 1 1 ] = [ 4.5 − 15.5 ] g 1 = [ 2 − 5 − 5 2 ] [ 4.5 − 15.5 ] − [ 25 8 ] = [ 61.5 − 61.5 ] a 1 = − ( [ 61.5 − 61.5 ] T [ 1 − 1 ] ) / ( [ 1 − 1 ] T [ 2 − 5 − 5 2 ] [ 1 − 1 ] ) = − 123 14 x 2 = [ − 4.5 15.5 ] − 123 14 [ 1 − 1 ] = [ − 30 7 − 47 7 ] \begin{align*} g_0 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}0\\-123\end{bmatrix}\\ a_0 &= -(\begin{bmatrix}0\\-123\end{bmatrix}^T\begin{bmatrix}1\\1\end{bmatrix}) / (\begin{bmatrix}1\\1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix})= -\frac{41}2\\ x_1 &= \begin{bmatrix}25\\5\end{bmatrix}-\frac{41}2\begin{bmatrix}1\\1\end{bmatrix}= \begin{bmatrix}4.5\\-15.5\end{bmatrix}\\ g_1 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix} \begin{bmatrix}4.5\\-15.5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}61.5\\-61.5\end{bmatrix}\\ a_1 &= -(\begin{bmatrix}61.5\\-61.5\end{bmatrix}^T\begin{bmatrix}1\\-1\end{bmatrix}) / (\begin{bmatrix}1\\-1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\-1\end{bmatrix})= -\frac{123}{14}\\ x_2 &= \begin{bmatrix}-4.5\\15.5\end{bmatrix}-\frac{123}{14}\begin{bmatrix}1\\-1\end{bmatrix}= \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} \end{align*} g 0 a 0 x 1 g 1 a 1 x 2 = [ 2 − 5 − 5 2 ] [ 25 5 ] − [ 25 8 ] = [ 0 − 123 ] = − ( [ 0 − 123 ] T [ 1 1 ] ) / ( [ 1 1 ] T [ 2 − 5 − 5 2 ] [ 1 1 ] ) = − 2 41 = [ 25 5 ] − 2 41 [ 1 1 ] = [ 4.5 − 15.5 ] = [ 2 − 5 − 5 2 ] [ 4.5 − 15.5 ] − [ 25 8 ] = [ 61.5 − 61.5 ] = − ( [ 61.5 − 61.5 ] T [ 1 − 1 ] ) / ( [ 1 − 1 ] T [ 2 − 5 − 5 2 ] [ 1 − 1 ] ) = − 14 123 = [ − 4.5 15.5 ] − 14 123 [ 1 − 1 ] = [ − 7 30 − 7 47 ] Part (c)
Compute the steps of conjugate directions method given directions d 1 , d 0 d_1, d_0 d 1 , d 0 .
g 0 = [ 2 − 5 − 5 2 ] [ 25 5 ] − [ 25 8 ] = [ 0 − 123 ] a 0 = − ( [ 0 − 123 ] T [ 1 − 1 ] ) / ( [ 1 − 1 ] T [ 2 − 5 − 5 2 ] [ 1 − 1 ] ) = − 123 14 x 1 = [ 25 5 ] − 123 14 [ 1 − 1 ] = [ 227 14 193 14 ] g 1 = [ 2 − 5 − 5 2 ] [ 227 14 193 14 ] − [ 25 8 ] = [ 61.5 − 61.5 ] a 1 = − ( [ 61.5 − 61.5 ] T [ 1 1 ] ) / ( [ 1 1 ] T [ 2 − 5 − 5 2 ] [ 1 1 ] ) = − 41 2 x 2 = [ 227 14 193 14 ] − 41 2 [ 1 1 ] = [ − 30 7 − 47 7 ] \begin{align*} g_0 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}0\\-123\end{bmatrix}\\ a_0 &= -(\begin{bmatrix}0\\-123\end{bmatrix}^T\begin{bmatrix}1\\-1\end{bmatrix}) / (\begin{bmatrix}1\\-1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\-1\end{bmatrix})= -\frac{123}{14}\\ x_1 &= \begin{bmatrix}25\\5\end{bmatrix}-\frac{123}{14}\begin{bmatrix}1\\-1\end{bmatrix}= \begin{bmatrix}\frac{227}{14}\\\frac{193}{14}\end{bmatrix}\\ g_1 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix} \begin{bmatrix}\frac{227}{14}\\\frac{193}{14}\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}61.5\\-61.5\end{bmatrix}\\ a_1 &= -(\begin{bmatrix}61.5\\-61.5\end{bmatrix}^T\begin{bmatrix}1\\1\end{bmatrix}) / (\begin{bmatrix}1\\1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix})= -\frac{41}{2}\\ x_2 &= \begin{bmatrix}\frac{227}{14}\\\frac{193}{14}\end{bmatrix}-\frac{41}{2}\begin{bmatrix}1\\1\end{bmatrix}= \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} \end{align*} g 0 a 0 x 1 g 1 a 1 x 2 = [ 2 − 5 − 5 2 ] [ 25 5 ] − [ 25 8 ] = [ 0 − 123 ] = − ( [ 0 − 123 ] T [ 1 − 1 ] ) / ( [ 1 − 1 ] T [ 2 − 5 − 5 2 ] [ 1 − 1 ] ) = − 14 123 = [ 25 5 ] − 14 123 [ 1 − 1 ] = [ 14 227 14 193 ] = [ 2 − 5 − 5 2 ] [ 14 227 14 193 ] − [ 25 8 ] = [ 61.5 − 61.5 ] = − ( [ 61.5 − 61.5 ] T [ 1 1 ] ) / ( [ 1 1 ] T [ 2 − 5 − 5 2 ] [ 1 1 ] ) = − 2 41 = [ 14 227 14 193 ] − 2 41 [ 1 1 ] = [ − 7 30 − 7 47 ] Part (d)
Compute the steps of conjugate gradients.
g 0 = [ 2 − 5 − 5 2 ] [ 25 5 ] − [ 25 8 ] = [ 0 − 123 ] d 0 = [ 0 123 ] a 0 = ( [ 0 123 ] ⋅ [ 0 123 ] ) / [ 0 123 ] [ 2 − 5 − 5 2 ] [ 0 123 ] = 0.5 x 1 = [ 25 5 ] + 0.5 [ 0 123 ] = [ 25 66.5 ] g 1 = [ 2 − 5 − 5 2 ] [ 25 66.5 ] − [ 25 8 ] = [ − 307.5 0 ] β 0 = ( [ − 307.5 0 ] T [ 2 − 5 − 5 2 ] [ 0 123 ] ) / ( [ 0 123 ] T [ 2 − 5 − 5 2 ] [ 0 123 ] ) = 6.25 d 1 = [ 307.5 0 ] + 6.25 [ 0 123 ] = [ 307.5 768.75 ] a 1 = ( [ − 307.5 0 ] ⋅ [ 307.5 768.75 ] ) / [ 307.5 768.75 ] [ 2 − 5 − 5 2 ] [ 307.5 768.75 ] = − 2 21 x 2 = [ 25 66.5 ] + 2 21 [ 307.5 768.75 ] = [ − 30 7 − 47 7 ] g 2 = [ 2 − 5 − 5 2 ] [ − 30 7 − 47 7 ] − [ 25 8 ] = [ 0 0 ] β 1 = 0 d 1 T Q d 1 = 0 d 2 = − g 2 + 0 = [ 0 0 ] x 3 = x 2 + a 2 [ 0 0 ] = x 2 = [ − 30 7 − 47 7 ] \begin{align*} g_0 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}0\\-123\end{bmatrix}\\ d_0 &= \begin{bmatrix}0\\123\end{bmatrix}\\ a_0 &= (\begin{bmatrix}0\\123\end{bmatrix}\cdot\begin{bmatrix}0\\123\end{bmatrix}) / \begin{bmatrix}0\\123\end{bmatrix}\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}0\\123\end{bmatrix} = 0.5\\ x_1 &= \begin{bmatrix}25\\5\end{bmatrix} + 0.5\begin{bmatrix}0\\123\end{bmatrix}= \begin{bmatrix}25\\66.5\end{bmatrix}\\ g_1 &=\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\66.5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}-307.5\\0\end{bmatrix}\\ \beta_0 &= (\begin{bmatrix}-307.5\\0\end{bmatrix}^T\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}0\\123\end{bmatrix}) / (\begin{bmatrix}0\\123\end{bmatrix}^T\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}0\\123\end{bmatrix})=6.25\\ d_1 &= \begin{bmatrix}307.5\\0\end{bmatrix} + 6.25\begin{bmatrix}0\\123\end{bmatrix} = \begin{bmatrix}307.5\\768.75\end{bmatrix}\\ a_1 &= (\begin{bmatrix}-307.5\\0\end{bmatrix}\cdot\begin{bmatrix}307.5\\768.75\end{bmatrix}) / \begin{bmatrix}307.5\\768.75\end{bmatrix}\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}307.5\\768.75\end{bmatrix} = -\frac{2}{21}\\ x_2 &= \begin{bmatrix}25\\66.5\end{bmatrix} + \frac{2}{21}\begin{bmatrix}307.5\\768.75\end{bmatrix} = \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix}\\ g_2 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix}\\ \beta_1 &= \frac{0}{d_1^TQd_1} = 0\\ d_2 &= -g_2 + 0 = \begin{bmatrix}0\\0\end{bmatrix}\\ x_3 &= x_2 + a_2\begin{bmatrix}0\\0\end{bmatrix} = x_2 = \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} \end{align*} g 0 d 0 a 0 x 1 g 1 β 0 d 1 a 1 x 2 g 2 β 1 d 2 x 3 = [ 2 − 5 − 5 2 ] [ 25 5 ] − [ 25 8 ] = [ 0 − 123 ] = [ 0 123 ] = ( [ 0 123 ] ⋅ [ 0 123 ] ) / [ 0 123 ] [ 2 − 5 − 5 2 ] [ 0 123 ] = 0.5 = [ 25 5 ] + 0.5 [ 0 123 ] = [ 25 66.5 ] = [ 2 − 5 − 5 2 ] [ 25 66.5 ] − [ 25 8 ] = [ − 307.5 0 ] = ( [ − 307.5 0 ] T [ 2 − 5 − 5 2 ] [ 0 123 ] ) / ( [ 0 123 ] T [ 2 − 5 − 5 2 ] [ 0 123 ] ) = 6.25 = [ 307.5 0 ] + 6.25 [ 0 123 ] = [ 307.5 768.75 ] = ( [ − 307.5 0 ] ⋅ [ 307.5 768.75 ] ) / [ 307.5 768.75 ] [ 2 − 5 − 5 2 ] [ 307.5 768.75 ] = − 21 2 = [ 25 66.5 ] + 21 2 [ 307.5 768.75 ] = [ − 7 30 − 7 47 ] = [ 2 − 5 − 5 2 ] [ − 7 30 − 7 47 ] − [ 25 8 ] = [ 0 0 ] = d 1 T Q d 1 0 = 0 = − g 2 + 0 = [ 0 0 ] = x 2 + a 2 [ 0 0 ] = x 2 = [ − 7 30 − 7 47 ] Example 3 Question
Prove that the Gram-Schmidt procedure generate a sequence of Q-conjugate directions given a linear independent set of vector p 0 , . . . , p n − 1 ∈ R n p_0,...,p_{n-1}\in\mathbb R^n p 0 , ... , p n − 1 ∈ R n .
proof . Note that this statement is equal to say that ∀ k ∈ { 0 , . . . , n − 1 } . ∀ j < k . d k T Q d j = 0 \forall k\in\{0,...,n-1\}. \forall j < k. d_k^TQd_j = 0 ∀ k ∈ { 0 , ... , n − 1 } .∀ j < k . d k T Q d j = 0 , and I'll prove this statement by strong induction.
First, since Q Q Q is symmetric d k T Q d j = d j T Q d k d_k^TQd_j = d_j^TQd_k d k T Q d j = d j T Q d k for any d j , d k d_j,d_k d j , d k . Also note that since d k d_k d k 's are linear combinations of p 0 , . . . , p n − 1 p_0, ..., p_{n-1} p 0 , ... , p n − 1 , d k ≠ 0 , ∀ k ∈ { 0 , . . . , n − 1 } d_k\neq 0, \forall k\in\{0,...,n-1\} d k = 0 , ∀ k ∈ { 0 , ... , n − 1 } , and since Q Q Q is positive definite d K T Q d k > 0 d_K^TQd_k > 0 d K T Q d k > 0 .
Then, note that for k = 0 k = 0 k = 0 , the statement is vacuously true. Fir k ∈ { 1 , . . . , n − 1 } k \in \{1, ..., n-1\} k ∈ { 1 , ... , n − 1 } , assume that ∀ m < k . ∀ j < m . d m T Q d j = d j T Q d m = 0 \forall m < k. \forall j < m. d_m^TQd_j = d_j^TQd_m = 0 ∀ m < k .∀ j < m . d m T Q d j = d j T Q d m = 0 , i.e. ∀ j , m < k , j ≠ m . d m T Q d j = 0 \forall j,m < k, j\neq m. d^T_mQd_j = 0 ∀ j , m < k , j = m . d m T Q d j = 0 Then, for some i < k i < k i < k , we have
d k T Q d j = ( p k − ∑ i = 0 k − 1 p k T Q d i d i T Q d i d i ) T Q d j = p k T Q d j − ∑ i = 0 k − 1 p k T Q d i d i T Q d i d i T Q d j = p k T Q d j − p k T Q d i d j T Q d j d j T Q d j = p k T Q d j − p k T Q d j = 0 \begin{align*} d_{k}^TQd_{j} &= (p_k - \sum_{i=0}^{k-1}\frac{p_k^TQ d_i}{d_i^TQd_i}d_i)^TQd_j\\ &= p_k^TQd_j- \sum_{i=0}^{k-1}\frac{p_k^TQ d_i}{d_i^TQd_i}d_i^TQd_j\\ &= p_k^TQd_j - \frac{p_k^TQ d_i}{d_j^TQd_j}d_j^TQd_j\\ &= p_k^TQd_j - p_k^TQd_j\\ &= 0 \end{align*} d k T Q d j = ( p k − i = 0 ∑ k − 1 d i T Q d i p k T Q d i d i ) T Q d j = p k T Q d j − i = 0 ∑ k − 1 d i T Q d i p k T Q d i d i T Q d j = p k T Q d j − d j T Q d j p k T Q d i d j T Q d j = p k T Q d j − p k T Q d j = 0 Example 4 Question
Let Q Q Q be positive definite, f ( x ) = 1 2 x T Q x − b T x f(x) = \frac12 x^TQx - b^Tx f ( x ) = 2 1 x T Q x − b T x . Let x 1 = arg min x ∈ S 1 f ( x ) , x 2 = arg min x ∈ S 2 f ( x ) x_1=\arg\min_{x\in S_1}f(x), x_2=\arg\min_{x\in S_2}f(x) x 1 = arg min x ∈ S 1 f ( x ) , x 2 = arg min x ∈ S 2 f ( x ) , where S 1 , S 2 ⊂ E n S_1,S_2\subset E^n S 1 , S 2 ⊂ E n and d ∈ S 1 ∩ S 2 d\in S_1\cap S_2 d ∈ S 1 ∩ S 2 , assume f ( x 1 ) < f ( x 2 ) f(x_1) < f(x_2) f ( x 1 ) < f ( x 2 ) . Show that ( x 1 − x 2 ) T Q d = 0 (x_1-x_2)^TQd = 0 ( x 1 − x 2 ) T Q d = 0 .
proof . Since x 1 x_1 x 1 is a minimizer of S 1 S_1 S 1 , we must have ∇ f ( x 1 ) T = 0 \nabla f(x_1)^T = 0 ∇ f ( x 1 ) T = 0 , otherwise we can have some ϵ > 0 , f ( x 1 − ϵ d ) < f ( x 1 ) \epsilon > 0, f(x_1 - \epsilon d) < f(x_1) ϵ > 0 , f ( x 1 − ϵ d ) < f ( x 1 ) and x 1 − ϵ d ∈ S 1 x_1-\epsilon d \in S_1 x 1 − ϵ d ∈ S 1 since x 1 ∈ S 1 , d ∈ S 1 x_1\in S_1, d\in S_1 x 1 ∈ S 1 , d ∈ S 1 . Note that the equation is expanded as
∇ f ( x 1 ) T d = ( Q x 1 − b ) T d = x 1 T Q T d − b T d = x 1 T Q d − b T d Q is symmetric = 0 \begin{align*} \nabla f(x_1)^T d &= (Qx_1 - b)^Td \\ &= x_1^TQ^Td - b^Td \\ &= x_1^TQd - b^Td &Q\text{ is symmetric}\\ &= 0 \end{align*} ∇ f ( x 1 ) T d = ( Q x 1 − b ) T d = x 1 T Q T d − b T d = x 1 T Q d − b T d = 0 Q is symmetric and similarly we have ∇ f ( x 2 ) T d = x 2 T Q d − b T d = 0 \nabla f(x_2)^T d = x_2^TQd - b^Td= 0 ∇ f ( x 2 ) T d = x 2 T Q d − b T d = 0 .
Therefore, we have
( x 1 − x 2 ) T Q d = x 1 T Q d − x 2 T Q d = b T d − b T d = 0 \begin{align*} (x_1-x_2)^TQd &= x_1^TQd - x_2^TQd \\ &= b^Td - b^Td \\ &= 0 \end{align*} ( x 1 − x 2 ) T Q d = x 1 T Q d − x 2 T Q d = b T d − b T d = 0 Example 5 Question
Let f = 1 2 x T Q x − b T x f = \frac12x^TQx - b^Tx f = 2 1 x T Q x − b T x where Q = d i a g ( λ 1 , . . . , λ n ) Q = diag(\lambda_1,...,\lambda_n) Q = d ia g ( λ 1 , ... , λ n ) being a diagonal, positive definite and symmetric matrix.
Part (a)
Show that standard basis vectors form a Q-orthogonal set.
proof. Let i , j ∈ { 1 , . . . , n } , i ≠ j i, j \in\{1, ...,n\}, i\neq j i , j ∈ { 1 , ... , n } , i = j . Denote e m k e_{mk} e mk be the k k k th entry of e m e_m e m , Q i j Q_{ij} Q ij be the entry on i i i th row and j j j th column of Q Q Q
e i T Q e j = ∑ p = 1 n ∑ q = 1 n Q p q e i p e j q e_i^TQe_j = \sum_{p=1}^n\sum_{q=1}^nQ_{pq}e_{ip}e_{jq} e i T Q e j = p = 1 ∑ n q = 1 ∑ n Q pq e i p e j q Note that Q p p = λ i , Q p q = 0 Q_{pp} = \lambda_i, Q_{pq}=0 Q pp = λ i , Q pq = 0 , for any p , q ∈ { 1 , . . . , n } , p ≠ q p,q\in\{1,...,n\}, p\neq q p , q ∈ { 1 , ... , n } , p = q . e i i = 1 , e i p = 0 e_{ii}=1, e_{ip}=0 e ii = 1 , e i p = 0 for p ∈ { 1 , . . . , n } − { i } p\in\{1,...,n\}-\{i\} p ∈ { 1 , ... , n } − { i } e j j = 1 , e j q = 0 e_{jj}=1, e_{jq}=0 e jj = 1 , e j q = 0 for q ∈ { 1 , . . . , n } − { j } q\in\{1,...,n\}-\{j\} q ∈ { 1 , ... , n } − { j } Therefore, consider each term of the summation, when p ≠ q , Q p q = 0 p\neq q, Q_{pq}=0 p = q , Q pq = 0 , when p = q p=q p = q , at least one of e i p , e j q e_{ip},e_{jq} e i p , e j q equals 0. Therefore, all terms in the summation are 0, e i T Q e j = 0 e_i^TQe_j = 0 e i T Q e j = 0 , hence { d 0 , . . . , d n − 1 } = { e 1 , . . . , e n } \{d_0,...,d_{n-1}\} = \{e_1,...,e_n\} { d 0 , ... , d n − 1 } = { e 1 , ... , e n } forms a Q-orthogonal set.
Part (b)
Prove x k = ( b 1 λ 1 , . . . , b k λ k , a k + 1 , . . . , a n ) x_k = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1},...,a_n) x k = ( λ 1 b 1 , ... , λ k b k , a k + 1 , ... , a n ) is the k k k th step of Conjugate direction method, starting from x 0 = ( a 1 , . . . , a n ) x_0 = (a_1,...,a_n) x 0 = ( a 1 , ... , a n ) .
proof . I'll prove by induction. Let k ∈ { 1 , . . . , n − 2 } k\in \{1,...,n-2\} k ∈ { 1 , ... , n − 2 } , assume x k = ( b 1 λ 1 , . . . , b k λ k , a k + 1 , . . . , a n ) x_{k} = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1},...,a_n) x k = ( λ 1 b 1 , ... , λ k b k , a k + 1 , ... , a n ) . Consider the ( k + 1 ) (k+1) ( k + 1 ) th step of conjugate direction method.
g k = Q x k − b = [ λ 1 b 1 λ 1 − b 1 ⋯ λ k b k λ k − b k λ k + 1 a k + 1 − b k + 1 ⋯ λ n a n − b n ] = [ 0 ⋯ 0 λ k + 1 a k + 1 − b k + 1 ⋯ λ n a n − b n ] g_k = Qx_k - b = \begin{bmatrix} \lambda_1\frac{b_1}{\lambda_1} - b_1\\ \cdots\\ \lambda_k\frac{b_k}{\lambda_k} - b_k\\ \lambda_{k+1}a_{k+1} - b_{k+1}\\ \cdots\\ \lambda_{n}a_n - b_{n} \end{bmatrix} = \begin{bmatrix} 0\\ \cdots\\ 0\\ \lambda_{k+1}a_{k+1} - b_{k+1}\\ \cdots\\ \lambda_{n}a_n - b_{n} \end{bmatrix} g k = Q x k − b = λ 1 λ 1 b 1 − b 1 ⋯ λ k λ k b k − b k λ k + 1 a k + 1 − b k + 1 ⋯ λ n a n − b n = 0 ⋯ 0 λ k + 1 a k + 1 − b k + 1 ⋯ λ n a n − b n a k = − g k T d k d k T Q d k = − g k T e k + 1 e k + 1 T Q e k + 1 − λ k + 1 a k + 1 − b k + 1 λ k + 1 = − a k + 1 + b k + 1 λ k + 1 a_k = -\frac{g_k^Td_k}{d_k^TQd_k}=-\frac{g_k^Te_{k+1}}{e_{k+1}^TQe_{k+1}}-\frac{\lambda_{k+1}a_{k+1} - b_{k+1}}{\lambda_{k+1}} = -a_{k+1}+\frac{b_{k+1}}{\lambda_{k+1}} a k = − d k T Q d k g k T d k = − e k + 1 T Q e k + 1 g k T e k + 1 − λ k + 1 λ k + 1 a k + 1 − b k + 1 = − a k + 1 + λ k + 1 b k + 1 x k + 1 = x k + a k d k = x k + a k e k + 1 [ b 1 λ 1 ⋯ b k λ k a k + 1 − a k + 1 + b k + 1 λ k + 1 a k + 2 ⋯ a n ] = [ b 1 λ 1 ⋯ b k + 1 λ k + 1 a k + 2 ⋯ a n ] x_{k+1} = x_k + a_kd_k = x_k + a_ke_{k+1} \begin{bmatrix} \frac{b_1}{\lambda_1}\\ \cdots\\ \frac{b_k}{\lambda_k}\\ a_{k+1} -a_{k+1}+\frac{b_{k+1}}{\lambda_{k+1}}\\ a_{k+2}\\ \cdots\\ a_n \end{bmatrix} = \begin{bmatrix} \frac{b_1}{\lambda_1}\\ \cdots\\ \frac{b_{k+1}}{\lambda_{k+1}}\\a_{k+2}\\ \cdots\\ a_n \end{bmatrix} x k + 1 = x k + a k d k = x k + a k e k + 1 λ 1 b 1 ⋯ λ k b k a k + 1 − a k + 1 + λ k + 1 b k + 1 a k + 2 ⋯ a n = λ 1 b 1 ⋯ λ k + 1 b k + 1 a k + 2 ⋯ a n Part (c)
Prove that ∀ k ≥ 1 , x k \forall k \geq 1, x_k ∀ k ≥ 1 , x k is the minimizer of f f f in the set x 0 + B k , B k = s p a n { d 0 , . . . , d k − 1 } = s p a n { e 1 , . . . , e k } x_0 + \mathcal B_k, \mathcal B_k = span\{d_0,...,d_{k-1}\} = span\{e_1,...,e_k\} x 0 + B k , B k = s p an { d 0 , ... , d k − 1 } = s p an { e 1 , ... , e k } .
proof . Let
ϕ ( y 1 , . . . , y k ) = f ( x 0 + ∑ i = 1 k y i e i ) = ( x 0 + ∑ i = 1 k y i e i ) T Q ( x 0 + ∑ i = 1 k y i e i ) − b T ( x 0 + ∑ i = 1 k y i e i ) \begin{align*} \phi(y_1,...,y_k)&=f(x_0 + \sum_{i=1}^k{y_ie_i})\\ &=(x_0 + \sum_{i=1}^k{y_ie_i})^TQ(x_0 + \sum_{i=1}^k{y_ie_i})-b^T(x_0 + \sum_{i=1}^k{y_ie_i}) \end{align*} ϕ ( y 1 , ... , y k ) = f ( x 0 + i = 1 ∑ k y i e i ) = ( x 0 + i = 1 ∑ k y i e i ) T Q ( x 0 + i = 1 ∑ k y i e i ) − b T ( x 0 + i = 1 ∑ k y i e i ) Therefore, the problem is equivalent to minimize ϕ \phi ϕ on ∈ R k \in\mathbb R^k ∈ R k . Note that
∂ ϕ ∂ y i = Q i ⋅ ( x 0 i + y i ) − b i = λ i ( a i + y i ) − b i \frac{\partial\phi}{\partial y_i} = Q_{i\cdot}(x_{0i}+y_i) - b_i = \lambda_i(a_i+y_i) - b_i ∂ y i ∂ ϕ = Q i ⋅ ( x 0 i + y i ) − b i = λ i ( a i + y i ) − b i for i = 1 , 2 , . . , k i=1,2,..,k i = 1 , 2 , .. , k , Therefore, set the derivative to 0 0 0 to satisfy the FONC, we have k k k equations
λ i ( a i + y i ) − b i = 0 ⇒ y i = b i λ i − a i \lambda_i(a_i+y_i)-b_i=0\Rightarrow y_i =\frac{b_i}{\lambda_i}-a_i λ i ( a i + y i ) − b i = 0 ⇒ y i = λ i b i − a i Then, note that ∂ 2 ϕ ∂ y i 2 = λ i , ∂ 2 ϕ ∂ y i y j = 0 \frac{\partial^{2}\phi}{\partial y_i^2} = \lambda_i, \frac{\partial^{2}\phi}{\partial y_i y_j} = 0 ∂ y i 2 ∂ 2 ϕ = λ i , ∂ y i y j ∂ 2 ϕ = 0 for i , j ∈ { 1 , . . . , k } , i ≠ j i,j\in\{1,...,k\}, i\neq j i , j ∈ { 1 , ... , k } , i = j , we have ∇ 2 ϕ = d i a g ( λ 1 , . . . , λ k ) \nabla^2\phi = diag(\lambda_1,...,\lambda_k) ∇ 2 ϕ = d ia g ( λ 1 , ... , λ k ) , i.e. the top-left k × k k\times k k × k submatrix of Q Q Q , since Q Q Q is positive definite, ∇ 2 ϕ \nabla^2\phi ∇ 2 ϕ is also positive definite, SOC also holds and
x 0 + ∑ i − 1 k ( λ i ( a i + y i ) − b i ) e i = ( b 1 λ 1 , . . . , b k λ k , a k + 1 , . . . , a n ) = x k x_0 + \sum_{i-1}^k(\lambda_i(a_i+y_i)-b_i)e_i = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1}, ..., a_n) = x_k x 0 + i − 1 ∑ k ( λ i ( a i + y i ) − b i ) e i = ( λ 1 b 1 , ... , λ k b k , a k + 1 , ... , a n ) = x k is the minimizer of ϕ \phi ϕ .
Example 6 Question
Let Q Q Q be a positive definite symmetric matrix.
Part (a)
Prove d ( x , y ) = [ ( x − y ) T Q ( x − y ) ] 1 / 2 d(x, y) = [(x-y)^TQ(x-y)]^{1/2} d ( x , y ) = [( x − y ) T Q ( x − y ) ] 1/2 is a metric.
proof . Let x , y ∈ R n x,y\in\mathbb R^n x , y ∈ R n .
positive definite Since Q Q Q is positive definite,
∀ a ∈ R n . a T Q a ≥ 0 ∧ a T Q a = 0 ⇔ a = 0 \forall a\in\mathbb R^n. a^TQa \geq 0\land a^TQa = 0\Leftrightarrow a = 0 ∀ a ∈ R n . a T Q a ≥ 0 ∧ a T Q a = 0 ⇔ a = 0 ⟹ ( x − y ) T Q ( x − y ) ≥ 0 ∧ ( x − y ) T Q ( x − y ) = 0 ⇔ x − y = 0 ⇔ x = y \implies (x-y)^TQ(x-y) \geq 0\land (x-y)^TQ(x-y) = 0\Leftrightarrow x-y = 0\Leftrightarrow x=y ⟹ ( x − y ) T Q ( x − y ) ≥ 0 ∧ ( x − y ) T Q ( x − y ) = 0 ⇔ x − y = 0 ⇔ x = y Therefore,
d ( x , y ) = [ ( x − y ) T Q ( x − y ) ] 1 / 2 ≥ 0 ∧ [ ( x − y ) T Q ( x − y ) ] 1 / 2 = 0 ⇔ x = y d(x,y)=[(x-y)^TQ(x-y)]^{1/2} \geq 0\land [(x-y)^TQ(x-y)]^{1/2} = 0\Leftrightarrow x=y d ( x , y ) = [( x − y ) T Q ( x − y ) ] 1/2 ≥ 0 ∧ [( x − y ) T Q ( x − y ) ] 1/2 = 0 ⇔ x = y symmetric
d ( x , y ) = [ ( x − y ) T Q ( x − y ) ] 1 / 2 = [ ( − ( y − x ) ) T Q ( − ( y − x ) ) ] 1 / 2 = [ − 1 ( − 1 ) ( y − x ) T Q ( y − x ) ] 1 / 2 = [ ( y − x ) T Q ( y − x ) ] 1 / 2 = d ( y , x ) \begin{align*} d(x,y) &= [(x-y)^TQ(x-y)]^{1/2} \\ &= [(-(y-x))^TQ(-(y-x))]^{1/2}\\ &= [-1(-1)(y-x)^TQ(y-x)]^{1/2}\\ &= [(y-x)^TQ(y-x)]^{1/2}\\ &= d(y,x) \end{align*} d ( x , y ) = [( x − y ) T Q ( x − y ) ] 1/2 = [( − ( y − x ) ) T Q ( − ( y − x )) ] 1/2 = [ − 1 ( − 1 ) ( y − x ) T Q ( y − x ) ] 1/2 = [( y − x ) T Q ( y − x ) ] 1/2 = d ( y , x ) triangular inequality
d ( x , z ) = [ ( x − z ) T Q ( x − z ) ] 1 / 2 = [ ( ( x − y ) + ( y − z ) ) T Q ( ( x − y ) + ( y − z ) ) ] 1 / 2 = [ ( x − y ) T Q ( x − y ) + ( y − z ) T Q ( y − z ) ] 1 / 2 = ( d ( x , y ) 2 + d ( y , z ) 2 ) 1 / 2 by triangular inequality on Euclidean norm of real numbers ≤ ( d ( x , y ) 2 ) 1 / 2 + ( d ( x , y ) 2 ) 1 / 2 = d ( x , y ) + d ( y , z ) \begin{align*} d(x,z)&= [(x-z)^TQ(x-z)]^{1/2}\\ &= [((x-y)+(y-z))^TQ((x-y)+(y-z))]^{1/2}\\ &= [(x-y)^TQ(x-y) + (y-z)^TQ(y-z)]^{1/2}\\ &= (d(x,y)^{2} + d(y,z)^2)^{1/2}\\ &\text{by triangular inequality on Euclidean norm of real numbers}\\ &\leq (d(x,y)^{2})^{1/2} + (d(x,y)^{2})^{1/2} \\ &= d(x,y) + d(y,z) \end{align*} d ( x , z ) = [( x − z ) T Q ( x − z ) ] 1/2 = [(( x − y ) + ( y − z ) ) T Q (( x − y ) + ( y − z )) ] 1/2 = [( x − y ) T Q ( x − y ) + ( y − z ) T Q ( y − z ) ] 1/2 = ( d ( x , y ) 2 + d ( y , z ) 2 ) 1/2 by triangular inequality on Euclidean norm of real numbers ≤ ( d ( x , y ) 2 ) 1/2 + ( d ( x , y ) 2 ) 1/2 = d ( x , y ) + d ( y , z ) Part (b)
For x ∗ ∈ R 2 , a ∈ R x^*\in\mathbb R^2, a\in\mathbb R x ∗ ∈ R 2 , a ∈ R , for x x x on the line L = { x ∈ R 2 ∣ x = ( t , a t ) , t ∈ R } L = \{x\in\mathbb R^2\mid x=(t,at), t\in\mathbb R\} L = { x ∈ R 2 ∣ x = ( t , a t ) , t ∈ R } , find x x x that minimizes d ( x , x ∗ ) d(x,x^*) d ( x , x ∗ ) .
Define
f ( x , y ) = d ( ( x , y ) , ( x ∗ , y ∗ ) ) = 1 2 [ x − x ∗ y − y ∗ ] T Q [ x − x ∗ y − y ∗ ] f(x, y) = d((x, y), (x^*, y^*)) = \frac12 \begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}^TQ\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix} f ( x , y ) = d (( x , y ) , ( x ∗ , y ∗ )) = 2 1 [ x − x ∗ y − y ∗ ] T Q [ x − x ∗ y − y ∗ ] therefore minimizing d ( ( x , y ) , ( x ∗ , y ∗ ) ) d((x,y), (x^*, y^*)) d (( x , y ) , ( x ∗ , y ∗ )) on L L L is equivalent to
minimize f ( x , y ) subject to l ( x , y ) = a x − y = 0 \begin{align*}&\text{minimize } &f(x,y)\\ &\text{subject to} &l(x,y) = ax-y = 0 \end{align*} minimize subject to f ( x , y ) l ( x , y ) = a x − y = 0 Note that ∇ f = Q [ x − x ∗ y − y ∗ ] , ∇ l = [ a − 1 ] \nabla f = Q\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}, \nabla l = \begin{bmatrix}a\\-1\end{bmatrix} ∇ f = Q [ x − x ∗ y − y ∗ ] , ∇ l = [ a − 1 ] using Lagrange multiplier, we have equations
Q [ t − x ∗ a t − y ∗ ] + λ [ a − 1 ] = 0 \begin{align*} Q\begin{bmatrix}t-x^*\\at-y^*\end{bmatrix} + \lambda\begin{bmatrix}a\\-1\end{bmatrix}= 0 \end{align*} Q [ t − x ∗ a t − y ∗ ] + λ [ a − 1 ] = 0 Therefore, since Q Q Q is symmetric, write Q = [ p m m q ] Q = \begin{bmatrix}p&m\\m&q\end{bmatrix} Q = [ p m m q ] we can solve for
t = ( p + m ) x ∗ + ( q + m ) x ∗ a 2 q + 2 a m + p t = \frac{(p+m)x^* + (q+m)x^*}{a^2q + 2am + p} t = a 2 q + 2 am + p ( p + m ) x ∗ + ( q + m ) x ∗ Since Q Q Q is positive definite, this solution is the minimum.
January 11, 2025 January 9, 2023