Skip to content

Equality Constrained Finite Dimension Optimization

Problem Definition

Consider the minimization problem

minf(x,y)subject toh(x,y)=x2+y21=0\begin{align*} \text{min}\quad &f(x,y) \\ \text{subject to}\quad &h(x,y) = x^2 + y^2 - 1 = 0 \end{align*}

Suppose (x0,y0)(x_0, y_0) is a local minimizer. Two cases: 1. f(x0,y0)0\nabla f(x_0, y_0) \neq 0: we claim that f(x0,y0)\nabla f(x_0, y_0) is perpendicular to the tangent space to the unit circle h1({0})h^{-1}(\{0\}) at (x0,y0)(x_0, y_0). If this is not the case, then we obtain a contradiction by looking at the level sets of ff, to which f\nabla f is perpendicular. Therefore f(x0,y0)=λh(x0,y0)\nabla f(x_0, y_0) = \lambda \nabla h(x_0, y_0) for some λ\lambda.

  1. f(x0,y0)=0\nabla f(x_0, y_0) = 0: as in the previous case, λ=0\lambda = 0.

In either case, at a local minimizer, the gradient of the function to be minimized is parallel to the gradient of the constraints.

Defn. Surface

For us, a surface is the set of common zeroes of a finite set of C1C^1 functions.

Defn. Differentiable Curve

For us, a differentiable curve on the surface MRnM \subseteq \mathbb R^n is the image of a C1C^1 function x:(a,b)Mx : (a, b) \to M.

Defn. Tangent Vector

Let x(s)x(s) be a differentiable curve on MM that passes through x0Mx_0 \in M at time x(0)=x0x(0) = x_0. The velocity vector v=ddss=0x(s)v = \left. \frac{d}{ds} \right|_{s=0} x(s) of x(s)x(s) at x0x_0 is, for us, said to be a tangent vector to the surface MM at x0x_0. The set of all tangent vectors to MM at x0x_0 is called the tangent space to MM at x0x_0 and is denoted by Tx0MT_{x_0}M.

Defn. Regular Point

Let M={xRn:h1(x)==hk(x)=0}M = \{x \in \mathbb R^n : h_1(x) = \cdots = h_k(x) = 0\} be a surface. If h1(x0),,hk(x0)\nabla h_1(x_0), \dots, \nabla h_k(x_0) are all linearly independent, then x0x_0 is said to be a regular point of MM.

Claim 1

At a regular point x0Mx_0 \in M, the tangent space Tx0MT_{x_0} M is given by

Tx0M={yRn:h(x0)y=0}T_{x_0} M = \{ y \in \mathbb R^n : \nabla \mathbf{h}(x_0)y = 0 \}

Claim 2

Let f,h1,,hkf, h_1, \dots, h_k be C1C^1 functions on the open set ΩRn\Omega \subseteq \mathbb R^n. Let x0M={xΩ:h1(x)==hk(x)=0}x_0 \in M = \{ x \in \Omega : h_1(x) = \cdots = h_k(x) = 0 \}. Suppose x0x_0 is a local minimizer of ff subject to the constraints hi(x)=0h_i(x) = 0. Then f(x0)\nabla f(x_0) is perpendicular to Tx0MT_{x_0}M.

proof. Without loss of generality, suppose Ω=Rn\Omega = \mathbb R^n. Let vTx0Mv \in T_{x_0}M. Then v=ddss=0x(s)v = \left. \frac{d}{ds} \right|_{s=0}x(s) for some differentiable curve x(s)x(s) in MM with x(0)=x0x(0) = x_0. Since x0x_0 is a local minimizer of ff, 00 is a local minimizer of fxf \circ x, so f(x0)x(0)=f(x0)v=0\nabla f(x_0) \cdot x'(0) = \nabla f(x_0) \cdot v = 0.

Thrm. First Order Necessary Conditions (Lagrange Multipliers)

Claim. Let f,h1,,hkf, h_1, \dots, h_k be C1C^1 functions on some open ΩRn\Omega \subseteq \mathbb R^n. Suppose x0x_0 is a local minimizer of ff subject to the constraints h1(x),,hk(x)=0h_1(x), \dots, h_k(x) = 0, which is also a regular point of these constraints. Then there are λ1,,λkR\lambda_1, \dots, \lambda_k \in \mathbb R ("Lagrange multipliers") such that

f(x0)+λ1h1(x0)++λkhk(x0)=0\nabla f(x_0) + \lambda_1 \nabla h_1(x_0) + \cdots + \lambda_k \nabla h_k(x_0) = 0

proof. Since x0x_0 is regular, Tx0M=span({h1(x0),,hk(x0)})T_{x_0}M = \mathrm{span}(\{ \nabla h_1(x_0), \dots, \nabla h_k(x_0) \})^\perp. By a lemma from last class, f(x0)(Tx0M)\nabla f(x_0) \in (T_{x_0}M)^\perp. Therefore f(x0)span({h1(x0),,hk(x0)})\nabla f(x_0) \in \mathrm{span}(\{ \nabla h_1(x_0), \dots, \nabla h_k(x_0) \}), since we are dealing with a finite dimensional vector space. We are done.

Example: Max volume of Box with constrained surface area

Given a fixed area A>0A > 0, how do we construct a box of maximum volume with surface area AA? Suppose the volume is V(x,y,z)=xyzV(x,y,z) = xyz and the area is A(x,y,z)=2(xy+xz+yz)A(x,y,z) = 2(xy+xz+yz). Our problem is stated as a maximization problem, so we have to convert it to a minimization problem. Let f=Vf = -V. We are therefore dealing with the problem

minimize f(x,y,z)=xyzsubject to h(x,y,z)=A(x,y,z)A=0,x,y,z0\begin{align*} \text{minimize } &f(x,y,z) = -xyz \\ \text{subject to } &h(x,y,z) = A(x,y,z) - A = 0, x,y,z \geq 0 \end{align*}

But we don't know how to deal with inequality constraints right now, so we have to make some changes. Note that if any one of x,y,zx,y,z is zero, then the volume is zero. Therefore the problem we want to consider is really the problem

minimize f(x,y,z)subject to h(x,y,z)=0,x,y,z>0\begin{align*} \text{minimize } &f(x,y,z) \\ \text{subject to } &h(x,y,z) = 0, x,y,z > 0 \end{align*}

Now, if Ω={(x,y,z)R3:x,y,z>0}\Omega = \{(x,y,z) \in \mathbb R^3 : x,y,z > 0\}, then the above minimization problem may be solved using the first order necessary condition we gave above, for the set Ω\Omega is open.

Suppose (x0,y0,z0)(x_0, y_0, z_0) is a local minimizer of ff subject to the constraint h(x,y,z)=0h(x,y,z) = 0. This point is regular because we are only considering points whose coordinates are all positive. Then there is a λR\lambda \in \mathbb R such that f(x0,y0,z0)+λh(x0,y0,z0)=0\nabla f(x_0, y_0, z_0) + \lambda \nabla h(x_0, y_0, z_0) = 0. Therefore [ (-y_0z_0, -x_0z_0, -x_0y_0) + \lambda (2y_0 + 2z_0, 2x_0 + 2z_0, 2x_0 + 2y_0) = (0,0,0). ] Equivalently,

2λ(y0+z0)=y0z02λ(x0+z0)=x0z02λ(x0+y0)=x0y0\begin{align*} 2\lambda (y_0 + z_0) &= y_0z_0 \\ 2\lambda (x_0 + z_0) &= x_0z_0 \\ 2\lambda (x_0 + y_0) &= x_0y_0 \end{align*}

Add all of these equations together: [ 2\lambda( 2x_0 + 2y_0 + 2z_0 ) = x_0z_0 + x_0y_0 + y_0z_0 = \frac{A}{2} > 0 ] implying that λ>0\lambda > 0. The first two equations tell us that

2λx0(y0+z0)=x0y0z02λy0(x0+z0)=x0y0z0.\begin{align*} 2\lambda x_0 (y_0 + z_0) &= x_0y_0z_0 \\ 2\lambda y_0 (x_0 + z_0) &= x_0y_0z_0. \end{align*}

Subtracting these two equations gives 2λ(x0z0y0z0)=02\lambda (x_0z_0 - y_0z_0) = 0. Cancelling the z0z_0's gives 2λ(x0y0)=02\lambda (x_0 - y_0) = 0, and since λ>0\lambda > 0, we have x0=y0x_0 = y_0. Since we could have done the same thing with the other pairs of equations, we get x0=y0=z0x_0 = y_0 = z_0.

Physically, this tells us that in order to maximize the volume of a rectangular solid of fixed area, we must make a cube. Note that we haven't actually solved the maximization problem; we've only figured out what form its solutions must take.

Thrm. Second Order Necessary Conditions

Claim. Let f,h1,,hkf, h_1, \dots, h_k be C2C^2 on some open set ΩRn\Omega \subseteq \mathbb R^n. Suppose x0x_0 is a regular point which is a local minimizer of ff subject to the constraints. Then
1. There are λ1,,λkR\lambda_1, \dots, \lambda_k \in \mathbb R such that

$$\nabla f(x_0) + \lambda_1 \nabla h_1(x_0) + \cdots + \lambda_k \nabla h_k(x_0) = 0$$
  1. The "Lagrangian"

    L(x0)=2f(x0)+λi2hi(x0)L(x_0) = \nabla^2 f(x_0) + \sum \lambda_i \nabla^2 h_i(x_0)

    is positive semi-definite on the tangent space Tx0MT_{x_0}M, where M=h11({0})hk1({0})M = h_1^{-1}(\{0\}) \cap \cdots \cap h_k^{-1}(\{0\})

proof. Let x(s)x(s) be a smooth curve with x(0)=0x(0) = 0 in MM. Recall that, by the product rule,

ddsf(x(s))=f(x(s))x(s)d2ds2f(x(s))=x(s)2f(x(s))x(s)+f(x(s))x(s).\begin{align*} \frac{d}{ds} f(x(s)) &= \nabla f(x(s)) \cdot x'(s) \\ \frac{d^2}{ds^2} f(x(s)) &= x'(s) \cdot \nabla^2 f(x(s)) x'(s) + \nabla f(x(s)) \cdot x''(s). \end{align*}

By the second order Taylor approximation, we have

0f(x(s))f(x(0))=sddss=0f(x(s))+12s2d2ds2s=0f(x(s))+o(s2)0 \leq f(x(s)) - f(x(0)) = s \left. \frac{d}{ds} \right|_{s=0} f(x(s)) + \frac{1}{2}s^2 \left. \frac{d^2}{ds^2} \right|_{s=0} f(x(s)) + o(s^2)

This is, equivalently,

0f(x(s))f(x(0))=sf(x0)x(0)Tx0M+12s2d2ds2s=0f(x(s))+o(s2)0 \leq f(x(s)) - f(x(0)) = s \nabla f(x_0) \cdot \underbrace{x'(0)}_{\in T_{x_0}M} + \frac{1}{2}s^2 \left. \frac{d^2}{ds^2} \right|_{s=0} f(x(s)) + o(s^2)

Since the gradient at a regular local minimizer is perpendicular to the tangent space there, the first-order term above vanishes. We have

012s2d2ds2s=0f(x(s))+o(s2)0 \leq \frac{1}{2}s^2 \left. \frac{d^2}{ds^2} \right|_{s=0} f(x(s)) + o(s^2)

By the definition of MM, we may write the above as

012s2d2ds2s=0[f(x(s))+λihi(x(s))]+o(s2)0 \leq \frac{1}{2}s^2 \left. \frac{d^2}{ds^2} \right|_{s=0} \left[ f(x(s)) + \sum \lambda_i h_i(x(s)) \right] + o(s^2)

Or

012s2x(0)(2f(x0)+λi2h(x0))=L(x0)x(0)+12s2(f(x0)+λihi(x0))=0x(0)+o(s2)0 \leq \frac{1}{2} s^2 x'(0) \cdot \underbrace{\left(\nabla^2 f(x_0) + \sum \lambda_i \nabla^2 h(x_0)\right)}_{=L(x_0)}x'(0) + \frac{1}{2} s^2 \underbrace{\left(\nabla f(x_0) + \sum \lambda_i \nabla h_i(x_0)\right)}_{=0} \cdot x''(0) + o(s^2)

Divide by s2s^2:

012x(0)L(x0)x(0)+o(s2)s20 \leq \frac{1}{2} x'(0) \cdot L(x_0) x'(0) + \frac{o(s^2)}{s^2}

By taking ss small it follows that 012x(0)L(x0)x(0)0 \leq \frac{1}{2} x'(0) \cdot L(x_0) x'(0). Since any tangent vector vTx0Mv \in T_{x_0}M can be described as the tangent vector to a curve in MM through x0x_0, it follows that L(x0)L(x_0) is positive semi-definite on Tx0MT_{x_0}M.

Thrm. Second Order Sufficient Conditions

Claim. Consider functions f,h1,,hkf, h_1, \dots, h_k which are C2C^2 on the open ΩRn\Omega \subseteq \mathbb R^n. Suppose x0x_0 is a regular point of the constraints given by h1(x)==hk(x)=0h_1(x) = \cdots = h_k(x) = 0. Let M=hi1({0})M = \bigcap h_i^{-1}(\{0\}). Suppose there exist λ1,,λkR\lambda_1, \dots, \lambda_k \in \mathbb R such that

  1. f(x0)+λihi(x0)=0\nabla f(x_0) + \sum \lambda_i \nabla h_i(x_0) = 0
  2. L(x0)=2f(x0)+λi2hi(x0)L(x_0) = \nabla^2 f(x_0) + \sum \lambda_i \nabla^2 h_i(x_0)

is positive definite on Tx0MT_{x_0}M.

Then x0x_0 is a strict local minimizer of ff on MM.

proof. Recall that if L(x0)L(x_0) is positive definite on Tx0MT_{x_0}M, then there is an a>0a > 0 such that vL(x0)vav2v \cdot L(x_0)v \geq a\|v\|^2 for all vTx0Mv \in T_{x_0}M. (This is very easily proven by diagonalizing the matrix.) Let x(s)x(s) be a smooth curve in MM such that x(0)=x0x(0) = x_0, and normalize the curve so that x(0)=1\|x'(0)\| = 1. We have which becomes

f(x(s))f(x(0))=sddss=0f(x(s))+12s2d2ds2s=0f(x(s))+o(s2)=sddss=0[f(x(s))+λihi(x(s))]+12s2d2ds2s=0[f(x(s))+λihi(x(s))]+o(s2)=s[f(x0)+λihi(x0)]=0 by 1.x(0)+12s2x(0)L(x0)x(0)+12s2[f(x0)+λihi(x0)]=0 by 1.x(0)+o(s2)=12s2x(0)TL(x0)x(0)+o(s2)12s2ax(0)2+o(s2)=12s2a+o(s2)=s2(12a+o(s2)s2)\begin{align*} f(x(s)) - f(x(0)) &= s \left. \frac{d}{ds} \right|_{s=0} f(x(s)) + \frac{1}{2} s^2 \left. \frac{d^2}{ds^2} \right|_{s=0} f(x(s)) + o(s^2) \\ &= s \left. \frac{d}{ds} \right|_{s=0} \left[ f(x(s)) + \sum \lambda_i h_i(x(s)) \right]+ \frac{1}{2} s^2 \left. \frac{d^2}{ds^2} \right|_{s=0} \left[ f(x(s)) + \sum \lambda_i h_i(x(s)) \right] + o(s^2) \\ &= s \underbrace{[ \nabla f(x_0) + \sum \lambda_i \nabla h_i(x_0) ]}_{=0 \text{ by 1.}} \cdot x'(0) + \frac{1}{2} s^2x'(0) \cdot L(x_0) x'(0) \\ &\qquad\qquad\qquad\qquad\qquad + \frac{1}{2} s^2\underbrace{[ \nabla f(x_0) + \sum \lambda_i \nabla h_i(x_0) ]}_{=0 \text{ by 1.}} \cdot x''(0) + o(s^2) \\ &= \frac{1}{2} s^2 x'(0)^T L(x_0) x'(0) + o(s^2) \\ &\geq \frac{1}{2}s^2 a\|x'(0)\|^2 + o(s^2) \\ &= \frac{1}{2}s^2 a + o(s^2) \\ &= s^2 \left( \frac{1}{2}a + \frac{o(s^2)}{s^2} \right) \end{align*}

For sufficiently small ss, the above is positive, so f(x(s))>f(x0)f(x(s)) > f(x_0) for all sufficiently small ss. Since x(s)x(s) was arbitrary, x0x_0 is a strict local minimizer of ff on MM.

Example 1

Recall the box example: maximizing the volume of a box of sides x,y,z0x,y,z\geq 0 subject to a fixed surface area A>0A > 0. We were really minimizing the negative of the volume. We got (x0,y0,z0)=(l,l,l)(x_0,y_0,z_0) = (l,l,l), where l=A/6l = \sqrt{A/6}. Our Lagrange multiplier was λ=A8(x0+y0+z0)=A24l>0\lambda = \frac{A}{8(x_0+y_0+z_0)} = \frac{A}{24 l} > 0. We had (after some calculation)

L(x0,y0,z0)=(2λl)(011101110)L(x_0,y_0,z_0) = (2\lambda - l)\begin{pmatrix} 0&1&1 \\ 1&0&1 \\ 1&1&0 \end{pmatrix}

Here, 2λl<02\lambda - l < 0. We have

T(x0,y0,z0)M=span(h(x0,y0,z0))={(u,v,w)R3:u+v+w=0}T_{(x_0,y_0,z_0)}M = \mathrm{span}( \nabla h(x_0,y_0,z_0) )^\perp = \{ (u,v,w) \in \mathbb R^3 : u+v+w=0 \}

since h(x0,y0,z0)=(4l,4l,4l)\nabla h(x_0,y_0,z_0) = (4l,4l,4l). If (u,v,w)T(x0,y0,z0)M(u,v,w) \in T_{(x_0,y_0,z_0)}M is nonzero,

(uvw)(2λl)(011101110)(uvw)=(uvw)(2λl)(v+wu+wu+v)=(2λl)(uvw)(uvw)=(2λl)(u2+v2+w2)>0,\begin{align*} \begin{pmatrix} u&v&w \end{pmatrix}(2\lambda - l)\begin{pmatrix} 0&1&1 \\ 1&0&1 \\ 1&1&0 \end{pmatrix} \begin{pmatrix} u\\v\\w \end{pmatrix} &= \begin{pmatrix} u&v&w \end{pmatrix}(2\lambda - l) \begin{pmatrix} v+w \\ u+w \\ u+v \end{pmatrix} \\ &= (2\lambda - l)\begin{pmatrix} u&v&w \end{pmatrix} \begin{pmatrix} -u \\ -v \\ -w \end{pmatrix} \\ &= -(2\lambda - l)(u^2+v^2+w^2) > 0, \end{align*}

so by the SOSC under equality constraints, our point (x0,y0,z0)(x_0,y_0,z_0) is a strict local maximizer of the volume. In fact, it is a strict global minimum (which is yet to be seen).

Example 2

Consider the problem

minimize f(x,y)=x2y2subject to h(x,y)=y=0.\begin{align*} \text{minimize } &f(x,y) = x^2-y^2 \\ \text{subject to } &h(x,y) = y = 0. \end{align*}

Then

f(x,y)+λh(x,y)=(2x2y)+λ(01)=(00)\nabla f(x,y) + \lambda \nabla h(x,y) = \begin{pmatrix} 2x \\ -2y \end{pmatrix} + \lambda \begin{pmatrix} 0 \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \end{pmatrix}

implying that λ=0\lambda = 0 and that (x,y)=(0,0)(x,y) = (0,0) is our candidate local minimizer. Since h(x,y)(0,0)\nabla h(x,y) \neq (0,0), the candidate is a regular point. We have

L(0,0)=(2002)+0(0000)=(2002)L(0,0) = \begin{pmatrix} 2 & 0 \\ 0 & -2 \end{pmatrix} + 0 \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix} = \begin{pmatrix} 2 & 0 \\ 0 & -2 \end{pmatrix}

which is not positive semi-definite everywhere. What about on the tangent space T(0,0)(x-axis)=(x-axis)T_{(0,0)}(\text{x-axis})=(\text{x-axis})? Clearly it is positive definite on the xx-axis, so by the SOSC that we just proved, (0,0)(0,0) is a strict local minimizer of ff on the xx-axis. Thinking of level sets, this is intuitively true.

Example 3

Consider the problem

minimize f(x,y)=(xa)2+(yb)2subject to h(x,y)=x2+y21=0.\begin{align*} \text{minimize } &f(x,y) = (x-a)^2 + (y-b)^2 \\ \text{subject to } &h(x,y) = x^2+y^2-1=0. \end{align*}

Let us assume that (a,b)(a,b) satisfies a2+b2>1a^2+b^2>1. We have h(x,y)=(2x,2y)\nabla h(x,y) = (2x,2y), which is non-zero on S1S^1, implying that every point of S1S^1 is a regular point. Lagrange tells us that

(2(xa)2(yb))+λ(2x2y)=(00)\begin{pmatrix} 2(x-a) \\ 2(y-b) \end{pmatrix} + \lambda \begin{pmatrix} 2x \\ 2y \end{pmatrix} = \begin{pmatrix} 0\\0 \end{pmatrix}

as well as x2+y2=1x^2+y^2=1. This may be written

(1+λ)x=a(1+λ)y=bx2+y2=1\begin{align*} (1+\lambda)x &= a \\ (1+\lambda)y &= b \\ x^2+y^2 &= 1 \end{align*}

By our assumption that a2+b2>1a^2+b^2>1, we have λ1\lambda \neq -1. Therefore

(xy)=11+λ(ab)\begin{pmatrix} x \\ y \end{pmatrix} = \frac{1}{1+\lambda}\begin{pmatrix} a \\ b \end{pmatrix}

which implies that

11+λ=1a2+b2\frac{1}{1+\lambda} = \frac{1}{\sqrt{a^2+b^2}}

by the third equation. Therefore

(x0y0)=1a2+b2(ab)\begin{pmatrix} x_0 \\ y_0 \end{pmatrix} = \frac{1}{\sqrt{a^2+b^2}}\begin{pmatrix} a \\ b \end{pmatrix}

Thinking of level sets, this is intuitively true. The Lagrangian is

L(x0,y0)=(2002)+λ(2002)=(1+λ)>0(2002)L(x_0,y_0) = \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} + \lambda \begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix} = \underbrace{(1+\lambda)}_{>0}\begin{pmatrix} 2 & 0 \\ 0 & 2 \end{pmatrix}

which, by the SOSC that we proved, proves that (x0,y0)(x_0, y_0) is a strict local minimizer of ff on S1S^1. In fact, this point is a global minimizer of ff on S1S^1, which follows immediately by the fact that ff necessarily takes on a global minimum on S1S^1 and that it only takes on the point (x0,y0)(x_0,y_0).

Example 4: Proof of Lagrange Multiplier on A Special Case

For a special case, we will derive the Lagrange multipliers equation. Suppose we are working with C1C^1 functions f,hf,h. Our problem is

minimize f(x,y,z)subject to g(x,y,z)=zh(x,y)=0.\begin{align*} \text{minimize } &f(x,y,z) \\ \text{subject to } &g(x,y,z) = z-h(x,y) = 0. \end{align*}

That is, we are minimizing f(x,y,z)f(x,y,z) on the graph Γh\Gamma_h of hh. The Lagrange equation tells us that

f(x,y,z)+λg(x,y,z)=(fx(x,y,z)fy(x,y,z)fz(x,y,z))+λ(hx(x,y,z)yx(x,y,z)1)=(000)\nabla f(x,y,z) + \lambda g(x,y,z) = \begin{pmatrix} \frac{\partial f}{\partial x}(x,y,z) \\ \frac{\partial f}{\partial y}(x,y,z) \\ \frac{\partial f}{\partial z}(x,y,z) \end{pmatrix} + \lambda \begin{pmatrix} -\frac{\partial h}{\partial x}(x,y,z) \\ -\frac{\partial y}{\partial x}(x,y,z) \\ 1 \end{pmatrix} = \begin{pmatrix} 0 \\ 0 \\ 0 \end{pmatrix}

We will derive the above formula by expressing it as an unconstrained minimization problem

minimize (x,y)R2F(x,y)\text{minimize }_{(x,y) \in \mathbb R^2} F(x,y)

for some function FF. We will then find the first order necessary conditions for an unconstrained minimization, and then express it as the equation we would like to prove.

Define F(x,y)=f(x,y,f(x,y))F(x,y) = f(x,y,f(x,y)). The constrained minimization problem is therefore equivalent to the unconstrained problem. By our theory of unconstrained minimization, F(x0,y0)=(0,0)\nabla F(x_0,y_0)=(0,0). That is,

F(x0,y0)=(fx+fzhxfy+fzhy)=(00)\nabla F(x_0,y_0) = \begin{pmatrix} \frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial h}{\partial x} \\ \frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial h}{\partial y} \end{pmatrix} = \begin{pmatrix} 0\\0 \end{pmatrix}

Rather,

fx+fzhx=0fy+fzhy=0\begin{align*} \frac{\partial f}{\partial x} + \frac{\partial f}{\partial z}\frac{\partial h}{\partial x} &= 0 \\ \frac{\partial f}{\partial y} + \frac{\partial f}{\partial z}\frac{\partial h}{\partial y} &= 0 \end{align*}

Let λ=fz\lambda = -\frac{\partial f}{\partial z}. The equation becomes

fxλhx=0fyλhy=0fz+λ=0\begin{align*} \frac{\partial f}{\partial x} - \lambda \frac{\partial h}{\partial x} &= 0 \\ \frac{\partial f}{\partial y} - \lambda \frac{\partial h}{\partial y} &= 0 \\ \frac{\partial f}{\partial z} + \lambda &= 0 \end{align*}

which is what we wanted.