Skip to content

Examples: Conjugate Directions and Conjugate Gradients

Example 1

Question

Let cRn{0},f(x)=12xTQxbTx,Q=I+ccTc\in\mathbb R^n - \{0\}, f(x) = \frac12x^TQx-b^Tx, Q = I +cc^T. Using conjugate gradient method, what's the smallest kk that guarantees xkx_k is the minimizer of ff.

Claim. k=2k=2

proof. First, consider some eigenvalues λ\lambda and corresponding eigenvector xx, by definition, we have

λx=Qxλx=(I+ccT)xλx=x+ccTx(λ1)x=(cTx)cNote that cTxR\begin{align*} \lambda x &= Qx\\ \lambda x &= (I + cc^T)x\\ \lambda x &= x + cc^Tx\\ (\lambda - 1)x &= (c^Tx)c &\text{Note that }c^Tx\in\mathbb R \end{align*}

If λ1=0\lambda - 1 = 0, we must have cTx=0c^Tx = 0, so that λ=1\lambda = 1 is a eigenvalue,
If λ10\lambda - 1 \neq 0, then xx and cc are linearly dependent, hence the eigenvector is cc and we have

λc=(I+ccT)cλc=c+ccTcλ=1+cTc\begin{align*} \lambda c &= (I + cc^T)c\\ \lambda c &= c + cc^Tc\\ \lambda &= 1 + c^Tc \end{align*}

Therefore, there are only 2 distinct eigenvalues for Q=I+ccTQ = I + cc^T.

Then, we can take P1(λ)=a+bλP_1(\lambda) = a + b\lambda be a polynomial of degree 1 s.t.

[1111+cTc][ab]=[111+cTc]\begin{bmatrix} 1&1\\ 1&1+c^Tc \end{bmatrix}\begin{bmatrix} a\\b \end{bmatrix}=\begin{bmatrix} -1\\-\frac{1}{1+c^Tc} \end{bmatrix}

so that 1+P(1)=01+P(1) = 0 and 1+(1+ccT)P(1+ccT)=01+(1+cc^T)P(1+cc^T) =0. Therefore, we have

q(x2)maxλ{1,1+cTc}(1+λP1(λ))q(x0)=0q(x_2) \leq \max_{\lambda \in \{1, 1+c^Tc\}}(1+\lambda P_1(\lambda))q(x_0) = 0

Therefore, k=2k = 2 guarantees x2x_2 is the minimizer of ff.

Example 2

Question

Let Q=[2552],b=[258],f=12xTQxbTxQ = \begin{bmatrix}2&-5\\-5&2\end{bmatrix}, b = \begin{bmatrix}25&8\end{bmatrix}, f = \frac{1}{2}x^TQx - b^Tx.

Part (a)

Find eigenvalues λ0λ1\lambda_0\leq \lambda_1 of QQ and corresponding eigenvectors.

First, find its characteristic polynomial as

(2λ)225=λ24λ+425=λ24λ+21=(λ7)(λ+3)(2-\lambda)^2 - 25 = \lambda^2 - 4\lambda + 4 -25 = \lambda^2 - 4\lambda+21=(\lambda-7)(\lambda+3)

set the equation to 00 and solve to get

λ0=3,λ1=7\lambda_0 = -3, \lambda_1 = 7

Then,

(Qλ0I)d0=0[5555]d0=0d0=[11](Qλ1I)d1=0[5555]d1=0d1=[11]\begin{align*} (Q - \lambda_0I)d_0 &= 0\\ \begin{bmatrix}5&-5\\-5&5\end{bmatrix}d_0 &= 0\\ d_0&= \begin{bmatrix}1\\1\end{bmatrix}\\ (Q - \lambda_1I)d_1 &= 0\\ \begin{bmatrix}-5&-5\\-5&-5\end{bmatrix}d_1 &= 0\\ d_1&= \begin{bmatrix}1\\-1\end{bmatrix} \end{align*}

Part (b)

Compute the steps of conjugate directions method given directions d0,d1d_0, d_1.

g0=[2552][255][258]=[0123]a0=([0123]T[11])/([11]T[2552][11])=412x1=[255]412[11]=[4.515.5]g1=[2552][4.515.5][258]=[61.561.5]a1=([61.561.5]T[11])/([11]T[2552][11])=12314x2=[4.515.5]12314[11]=[307477]\begin{align*} g_0 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}0\\-123\end{bmatrix}\\ a_0 &= -(\begin{bmatrix}0\\-123\end{bmatrix}^T\begin{bmatrix}1\\1\end{bmatrix}) / (\begin{bmatrix}1\\1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix})= -\frac{41}2\\ x_1 &= \begin{bmatrix}25\\5\end{bmatrix}-\frac{41}2\begin{bmatrix}1\\1\end{bmatrix}= \begin{bmatrix}4.5\\-15.5\end{bmatrix}\\ g_1 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix} \begin{bmatrix}4.5\\-15.5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}61.5\\-61.5\end{bmatrix}\\ a_1 &= -(\begin{bmatrix}61.5\\-61.5\end{bmatrix}^T\begin{bmatrix}1\\-1\end{bmatrix}) / (\begin{bmatrix}1\\-1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\-1\end{bmatrix})= -\frac{123}{14}\\ x_2 &= \begin{bmatrix}-4.5\\15.5\end{bmatrix}-\frac{123}{14}\begin{bmatrix}1\\-1\end{bmatrix}= \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} \end{align*}

Part (c)

Compute the steps of conjugate directions method given directions d1,d0d_1, d_0.

g0=[2552][255][258]=[0123]a0=([0123]T[11])/([11]T[2552][11])=12314x1=[255]12314[11]=[2271419314]g1=[2552][2271419314][258]=[61.561.5]a1=([61.561.5]T[11])/([11]T[2552][11])=412x2=[2271419314]412[11]=[307477]\begin{align*} g_0 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}0\\-123\end{bmatrix}\\ a_0 &= -(\begin{bmatrix}0\\-123\end{bmatrix}^T\begin{bmatrix}1\\-1\end{bmatrix}) / (\begin{bmatrix}1\\-1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\-1\end{bmatrix})= -\frac{123}{14}\\ x_1 &= \begin{bmatrix}25\\5\end{bmatrix}-\frac{123}{14}\begin{bmatrix}1\\-1\end{bmatrix}= \begin{bmatrix}\frac{227}{14}\\\frac{193}{14}\end{bmatrix}\\ g_1 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix} \begin{bmatrix}\frac{227}{14}\\\frac{193}{14}\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}61.5\\-61.5\end{bmatrix}\\ a_1 &= -(\begin{bmatrix}61.5\\-61.5\end{bmatrix}^T\begin{bmatrix}1\\1\end{bmatrix}) / (\begin{bmatrix}1\\1\end{bmatrix}^T \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}1\\1\end{bmatrix})= -\frac{41}{2}\\ x_2 &= \begin{bmatrix}\frac{227}{14}\\\frac{193}{14}\end{bmatrix}-\frac{41}{2}\begin{bmatrix}1\\1\end{bmatrix}= \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} \end{align*}

Part (d)

Compute the steps of conjugate gradients.

g0=[2552][255][258]=[0123]d0=[0123]a0=([0123][0123])/[0123][2552][0123]=0.5x1=[255]+0.5[0123]=[2566.5]g1=[2552][2566.5][258]=[307.50]β0=([307.50]T[2552][0123])/([0123]T[2552][0123])=6.25d1=[307.50]+6.25[0123]=[307.5768.75]a1=([307.50][307.5768.75])/[307.5768.75][2552][307.5768.75]=221x2=[2566.5]+221[307.5768.75]=[307477]g2=[2552][307477][258]=[00]β1=0d1TQd1=0d2=g2+0=[00]x3=x2+a2[00]=x2=[307477]\begin{align*} g_0 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}0\\-123\end{bmatrix}\\ d_0 &= \begin{bmatrix}0\\123\end{bmatrix}\\ a_0 &= (\begin{bmatrix}0\\123\end{bmatrix}\cdot\begin{bmatrix}0\\123\end{bmatrix}) / \begin{bmatrix}0\\123\end{bmatrix}\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}0\\123\end{bmatrix} = 0.5\\ x_1 &= \begin{bmatrix}25\\5\end{bmatrix} + 0.5\begin{bmatrix}0\\123\end{bmatrix}= \begin{bmatrix}25\\66.5\end{bmatrix}\\ g_1 &=\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}25\\66.5\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix}= \begin{bmatrix}-307.5\\0\end{bmatrix}\\ \beta_0 &= (\begin{bmatrix}-307.5\\0\end{bmatrix}^T\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}0\\123\end{bmatrix}) / (\begin{bmatrix}0\\123\end{bmatrix}^T\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}0\\123\end{bmatrix})=6.25\\ d_1 &= \begin{bmatrix}307.5\\0\end{bmatrix} + 6.25\begin{bmatrix}0\\123\end{bmatrix} = \begin{bmatrix}307.5\\768.75\end{bmatrix}\\ a_1 &= (\begin{bmatrix}-307.5\\0\end{bmatrix}\cdot\begin{bmatrix}307.5\\768.75\end{bmatrix}) / \begin{bmatrix}307.5\\768.75\end{bmatrix}\begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}307.5\\768.75\end{bmatrix} = -\frac{2}{21}\\ x_2 &= \begin{bmatrix}25\\66.5\end{bmatrix} + \frac{2}{21}\begin{bmatrix}307.5\\768.75\end{bmatrix} = \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix}\\ g_2 &= \begin{bmatrix}2&-5\\-5&2\end{bmatrix}\begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} - \begin{bmatrix}25\\8\end{bmatrix} = \begin{bmatrix}0\\0\end{bmatrix}\\ \beta_1 &= \frac{0}{d_1^TQd_1} = 0\\ d_2 &= -g_2 + 0 = \begin{bmatrix}0\\0\end{bmatrix}\\ x_3 &= x_2 + a_2\begin{bmatrix}0\\0\end{bmatrix} = x_2 = \begin{bmatrix}-\frac{30}7\\-\frac{47}7\end{bmatrix} \end{align*}

Example 3

Question

Prove that the Gram-Schmidt procedure generate a sequence of Q-conjugate directions given a linear independent set of vector p0,...,pn1Rnp_0,...,p_{n-1}\in\mathbb R^n.

proof. Note that this statement is equal to say that k{0,...,n1}.j<k.dkTQdj=0\forall k\in\{0,...,n-1\}. \forall j < k. d_k^TQd_j = 0, and I'll prove this statement by strong induction.

First, since QQ is symmetric dkTQdj=djTQdkd_k^TQd_j = d_j^TQd_k for any dj,dkd_j,d_k. Also note that since dkd_k's are linear combinations of p0,...,pn1p_0, ..., p_{n-1}, dk0,k{0,...,n1}d_k\neq 0, \forall k\in\{0,...,n-1\}, and since QQ is positive definite dKTQdk>0d_K^TQd_k > 0.

Then, note that for k=0k = 0, the statement is vacuously true.
Fir k{1,...,n1}k \in \{1, ..., n-1\}, assume that m<k.j<m.dmTQdj=djTQdm=0\forall m < k. \forall j < m. d_m^TQd_j = d_j^TQd_m = 0, i.e. j,m<k,jm.dmTQdj=0\forall j,m < k, j\neq m. d^T_mQd_j = 0 Then, for some i<ki < k, we have

dkTQdj=(pki=0k1pkTQdidiTQdidi)TQdj=pkTQdji=0k1pkTQdidiTQdidiTQdj=pkTQdjpkTQdidjTQdjdjTQdj=pkTQdjpkTQdj=0\begin{align*} d_{k}^TQd_{j} &= (p_k - \sum_{i=0}^{k-1}\frac{p_k^TQ d_i}{d_i^TQd_i}d_i)^TQd_j\\ &= p_k^TQd_j- \sum_{i=0}^{k-1}\frac{p_k^TQ d_i}{d_i^TQd_i}d_i^TQd_j\\ &= p_k^TQd_j - \frac{p_k^TQ d_i}{d_j^TQd_j}d_j^TQd_j\\ &= p_k^TQd_j - p_k^TQd_j\\ &= 0 \end{align*}

Example 4

Question

Let QQ be positive definite, f(x)=12xTQxbTxf(x) = \frac12 x^TQx - b^Tx. Let x1=argminxS1f(x),x2=argminxS2f(x)x_1=\arg\min_{x\in S_1}f(x), x_2=\arg\min_{x\in S_2}f(x), where S1,S2EnS_1,S_2\subset E^n and dS1S2d\in S_1\cap S_2, assume f(x1)<f(x2)f(x_1) < f(x_2). Show that (x1x2)TQd=0(x_1-x_2)^TQd = 0.

proof. Since x1x_1 is a minimizer of S1S_1, we must have f(x1)T=0\nabla f(x_1)^T = 0, otherwise we can have some ϵ>0,f(x1ϵd)<f(x1)\epsilon > 0, f(x_1 - \epsilon d) < f(x_1) and x1ϵdS1x_1-\epsilon d \in S_1 since x1S1,dS1x_1\in S_1, d\in S_1. Note that the equation is expanded as

f(x1)Td=(Qx1b)Td=x1TQTdbTd=x1TQdbTdQ is symmetric=0\begin{align*} \nabla f(x_1)^T d &= (Qx_1 - b)^Td \\ &= x_1^TQ^Td - b^Td \\ &= x_1^TQd - b^Td &Q\text{ is symmetric}\\ &= 0 \end{align*}

and similarly we have f(x2)Td=x2TQdbTd=0\nabla f(x_2)^T d = x_2^TQd - b^Td= 0.

Therefore, we have

(x1x2)TQd=x1TQdx2TQd=bTdbTd=0\begin{align*} (x_1-x_2)^TQd &= x_1^TQd - x_2^TQd \\ &= b^Td - b^Td \\ &= 0 \end{align*}

Example 5

Question

Let f=12xTQxbTxf = \frac12x^TQx - b^Tx where Q=diag(λ1,...,λn)Q = diag(\lambda_1,...,\lambda_n) being a diagonal, positive definite and symmetric matrix.

Part (a)

Show that standard basis vectors form a Q-orthogonal set.

proof. Let i,j{1,...,n},iji, j \in\{1, ...,n\}, i\neq j.
Denote emke_{mk} be the kkth entry of eme_m, QijQ_{ij} be the entry on iith row and jjth column of QQ

eiTQej=p=1nq=1nQpqeipejqe_i^TQe_j = \sum_{p=1}^n\sum_{q=1}^nQ_{pq}e_{ip}e_{jq}

Note that
Qpp=λi,Qpq=0Q_{pp} = \lambda_i, Q_{pq}=0, for any p,q{1,...,n},pqp,q\in\{1,...,n\}, p\neq q.
eii=1,eip=0e_{ii}=1, e_{ip}=0 for p{1,...,n}{i}p\in\{1,...,n\}-\{i\}
ejj=1,ejq=0e_{jj}=1, e_{jq}=0 for q{1,...,n}{j}q\in\{1,...,n\}-\{j\}
Therefore, consider each term of the summation, when pq,Qpq=0p\neq q, Q_{pq}=0, when p=qp=q, at least one of eip,ejqe_{ip},e_{jq} equals 0. Therefore, all terms in the summation are 0, eiTQej=0e_i^TQe_j = 0, hence {d0,...,dn1}={e1,...,en}\{d_0,...,d_{n-1}\} = \{e_1,...,e_n\} forms a Q-orthogonal set.

Part (b)

Prove xk=(b1λ1,...,bkλk,ak+1,...,an)x_k = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1},...,a_n) is the kkth step of Conjugate direction method, starting from x0=(a1,...,an)x_0 = (a_1,...,a_n).

proof. I'll prove by induction.
Let k{1,...,n2}k\in \{1,...,n-2\}, assume xk=(b1λ1,...,bkλk,ak+1,...,an)x_{k} = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1},...,a_n). Consider the (k+1)(k+1)th step of conjugate direction method.

gk=Qxkb=[λ1b1λ1b1λkbkλkbkλk+1ak+1bk+1λnanbn]=[00λk+1ak+1bk+1λnanbn]g_k = Qx_k - b = \begin{bmatrix} \lambda_1\frac{b_1}{\lambda_1} - b_1\\ \cdots\\ \lambda_k\frac{b_k}{\lambda_k} - b_k\\ \lambda_{k+1}a_{k+1} - b_{k+1}\\ \cdots\\ \lambda_{n}a_n - b_{n} \end{bmatrix} = \begin{bmatrix} 0\\ \cdots\\ 0\\ \lambda_{k+1}a_{k+1} - b_{k+1}\\ \cdots\\ \lambda_{n}a_n - b_{n} \end{bmatrix}
ak=gkTdkdkTQdk=gkTek+1ek+1TQek+1λk+1ak+1bk+1λk+1=ak+1+bk+1λk+1a_k = -\frac{g_k^Td_k}{d_k^TQd_k}=-\frac{g_k^Te_{k+1}}{e_{k+1}^TQe_{k+1}}-\frac{\lambda_{k+1}a_{k+1} - b_{k+1}}{\lambda_{k+1}} = -a_{k+1}+\frac{b_{k+1}}{\lambda_{k+1}}
xk+1=xk+akdk=xk+akek+1[b1λ1bkλkak+1ak+1+bk+1λk+1ak+2an]=[b1λ1bk+1λk+1ak+2an]x_{k+1} = x_k + a_kd_k = x_k + a_ke_{k+1} \begin{bmatrix} \frac{b_1}{\lambda_1}\\ \cdots\\ \frac{b_k}{\lambda_k}\\ a_{k+1} -a_{k+1}+\frac{b_{k+1}}{\lambda_{k+1}}\\ a_{k+2}\\ \cdots\\ a_n \end{bmatrix} = \begin{bmatrix} \frac{b_1}{\lambda_1}\\ \cdots\\ \frac{b_{k+1}}{\lambda_{k+1}}\\a_{k+2}\\ \cdots\\ a_n \end{bmatrix}

Part (c)

Prove that k1,xk\forall k \geq 1, x_k is the minimizer of ff in the set x0+Bk,Bk=span{d0,...,dk1}=span{e1,...,ek}x_0 + \mathcal B_k, \mathcal B_k = span\{d_0,...,d_{k-1}\} = span\{e_1,...,e_k\}.

proof. Let

ϕ(y1,...,yk)=f(x0+i=1kyiei)=(x0+i=1kyiei)TQ(x0+i=1kyiei)bT(x0+i=1kyiei)\begin{align*} \phi(y_1,...,y_k)&=f(x_0 + \sum_{i=1}^k{y_ie_i})\\ &=(x_0 + \sum_{i=1}^k{y_ie_i})^TQ(x_0 + \sum_{i=1}^k{y_ie_i})-b^T(x_0 + \sum_{i=1}^k{y_ie_i}) \end{align*}

Therefore, the problem is equivalent to minimize ϕ\phi on Rk\in\mathbb R^k.
Note that

ϕyi=Qi(x0i+yi)bi=λi(ai+yi)bi\frac{\partial\phi}{\partial y_i} = Q_{i\cdot}(x_{0i}+y_i) - b_i = \lambda_i(a_i+y_i) - b_i

for i=1,2,..,ki=1,2,..,k,
Therefore, set the derivative to 00 to satisfy the FONC, we have kk equations

λi(ai+yi)bi=0yi=biλiai\lambda_i(a_i+y_i)-b_i=0\Rightarrow y_i =\frac{b_i}{\lambda_i}-a_i

Then, note that 2ϕyi2=λi,2ϕyiyj=0\frac{\partial^{2}\phi}{\partial y_i^2} = \lambda_i, \frac{\partial^{2}\phi}{\partial y_i y_j} = 0 for i,j{1,...,k},iji,j\in\{1,...,k\}, i\neq j, we have 2ϕ=diag(λ1,...,λk)\nabla^2\phi = diag(\lambda_1,...,\lambda_k), i.e. the top-left k×kk\times k submatrix of QQ, since QQ is positive definite, 2ϕ\nabla^2\phi is also positive definite, SOC also holds and

x0+i1k(λi(ai+yi)bi)ei=(b1λ1,...,bkλk,ak+1,...,an)=xkx_0 + \sum_{i-1}^k(\lambda_i(a_i+y_i)-b_i)e_i = (\frac{b_1}{\lambda_1},...,\frac{b_k}{\lambda_k}, a_{k+1}, ..., a_n) = x_k

is the minimizer of ϕ\phi.

Example 6

Question

Let QQ be a positive definite symmetric matrix.

Part (a)

Prove d(x,y)=[(xy)TQ(xy)]1/2d(x, y) = [(x-y)^TQ(x-y)]^{1/2} is a metric.

proof. Let x,yRnx,y\in\mathbb R^n.

positive definite
Since QQ is positive definite,

aRn.aTQa0aTQa=0a=0\forall a\in\mathbb R^n. a^TQa \geq 0\land a^TQa = 0\Leftrightarrow a = 0
    (xy)TQ(xy)0(xy)TQ(xy)=0xy=0x=y\implies (x-y)^TQ(x-y) \geq 0\land (x-y)^TQ(x-y) = 0\Leftrightarrow x-y = 0\Leftrightarrow x=y

Therefore,

d(x,y)=[(xy)TQ(xy)]1/20[(xy)TQ(xy)]1/2=0x=yd(x,y)=[(x-y)^TQ(x-y)]^{1/2} \geq 0\land [(x-y)^TQ(x-y)]^{1/2} = 0\Leftrightarrow x=y

symmetric

d(x,y)=[(xy)TQ(xy)]1/2=[((yx))TQ((yx))]1/2=[1(1)(yx)TQ(yx)]1/2=[(yx)TQ(yx)]1/2=d(y,x)\begin{align*} d(x,y) &= [(x-y)^TQ(x-y)]^{1/2} \\ &= [(-(y-x))^TQ(-(y-x))]^{1/2}\\ &= [-1(-1)(y-x)^TQ(y-x)]^{1/2}\\ &= [(y-x)^TQ(y-x)]^{1/2}\\ &= d(y,x) \end{align*}

triangular inequality

d(x,z)=[(xz)TQ(xz)]1/2=[((xy)+(yz))TQ((xy)+(yz))]1/2=[(xy)TQ(xy)+(yz)TQ(yz)]1/2=(d(x,y)2+d(y,z)2)1/2by triangular inequality on Euclidean norm of real numbers(d(x,y)2)1/2+(d(x,y)2)1/2=d(x,y)+d(y,z)\begin{align*} d(x,z)&= [(x-z)^TQ(x-z)]^{1/2}\\ &= [((x-y)+(y-z))^TQ((x-y)+(y-z))]^{1/2}\\ &= [(x-y)^TQ(x-y) + (y-z)^TQ(y-z)]^{1/2}\\ &= (d(x,y)^{2} + d(y,z)^2)^{1/2}\\ &\text{by triangular inequality on Euclidean norm of real numbers}\\ &\leq (d(x,y)^{2})^{1/2} + (d(x,y)^{2})^{1/2} \\ &= d(x,y) + d(y,z) \end{align*}

Part (b)

For xR2,aRx^*\in\mathbb R^2, a\in\mathbb R, for xx on the line L={xR2x=(t,at),tR}L = \{x\in\mathbb R^2\mid x=(t,at), t\in\mathbb R\}, find xx that minimizes d(x,x)d(x,x^*).

Define

f(x,y)=d((x,y),(x,y))=12[xxyy]TQ[xxyy]f(x, y) = d((x, y), (x^*, y^*)) = \frac12 \begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}^TQ\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}

therefore minimizing d((x,y),(x,y))d((x,y), (x^*, y^*)) on LL is equivalent to

minimize f(x,y)subject tol(x,y)=axy=0\begin{align*}&\text{minimize } &f(x,y)\\ &\text{subject to} &l(x,y) = ax-y = 0 \end{align*}

Note that f=Q[xxyy],l=[a1]\nabla f = Q\begin{bmatrix}x-x^*\\y-y^*\end{bmatrix}, \nabla l = \begin{bmatrix}a\\-1\end{bmatrix} using Lagrange multiplier, we have equations

Q[txaty]+λ[a1]=0\begin{align*} Q\begin{bmatrix}t-x^*\\at-y^*\end{bmatrix} + \lambda\begin{bmatrix}a\\-1\end{bmatrix}= 0 \end{align*}

Therefore, since QQ is symmetric, write Q=[pmmq]Q = \begin{bmatrix}p&m\\m&q\end{bmatrix}we can solve for

t=(p+m)x+(q+m)xa2q+2am+pt = \frac{(p+m)x^* + (q+m)x^*}{a^2q + 2am + p}

Since QQ is positive definite, this solution is the minimum.