Lagrange Multiplier : Intuition via Interaction

Introduction

I guess you end up being here after coming across the term “constrained optimization” or “Lagrangian” and wanted to understand what “Lagrange multiplier is?”. Well, in this post, I help you understand the foundation of it with interactive plots (you can find plenty of mathematical reasoning on the net). Let’s get straight to the point. Consider the (objective) function, $z=2x+y$. What is the maximum or the minimum value of the function $z$. Well, we can see that the function neither has a maximum nor a minimum. To verify, we take a derivate and set it to zero. Then you will see that the derivative is a constant (the function is changing at a constant pace everywhere in the domain $x$ and $y$).The function is plotted below (rotate,zoom in/out..)

Putting constraints

Now, we can put constraints on the range of values that input variables can take. Say for example, $x^2+y^2=1$. This is an equality constraint (circle). We can have inequality constraints as well (disk). Note that, the constraint is also a function of input variables $(x,y)$. The function $z$ with the constraints on the input variables has a maximum and a minimum. The problem is formulated as follows

$\max \limits_{x,y} z$, subject to (s.t), $x^2+y^2=1$

How do you find the maximum (or minimum)?.

Explorative Approach

Take all the points $(x,y)$ on the unit circle and evaluate the function at each point. The function attains its maximum (minimum) at some point $(x=x_0,y=y_0)$. Move the point $A$ on the circle and observe the change in the function values (better note down the values)

The maximum occurs at $x=0.88,y=0.49$ with the maximum value of the function being 2.24.If you move the point $A$ a bit clockwise or anti-clockwise, the function value decreases from 2.24. Similarly, if you move the point $A$ around the circle (that is, in the feasible region), then you will find the values of $(x,y)$ for which the function attains its minimum.

The explorative approach of finding the maximum by evaluating the function $z$ by considering all possible input points that satisfy the constraints is helpful to get started but inefficient (infeasible). However, we can explore it a bit further to see what we are actually looking for. Could we observe something unique?

The constraint function $x^2+y^2=0$ is living in $\mathbb{R}^2$
The objective function $z=2x+y$ is in $\mathbb{R}^3$

Therefore, let’s plot the contours of the objective function $2x+y=k$ for various values of $k$ (caution: $k \subset z$, that satisfies the constraint).That is for each value of $k$, a plane cuts through the objective function and we project it down on to the coutours of the constraint function (now, both live in $\mathbb{R}^2$). Now, let us move the point $A$ and also display the contours of the objective function $k=2x(A)+y(A)$ where $x(A)$ represents the $x$ coordinate of the point $A$.

What is your observation about the line (contours of the objective function) to the circle (contours of the constraint)? Especially, at the point it maximizes (minimizes) the objective function.

The line is tangent to the circle (In general, the contour of the objective function is tangent to the contour of the constraint function at maximum or minimum)

Analytical approach

With this observation, we can introduce the gradient vector (a simple quantity that we can calculate) of both objective and constraint functions and see how they are related. Once again, move the point $A$ and observe the direction of gradient vectors. Where they are perpendicular to each other? Where they are parallel to each other?

We can see that the gradient of the objective function and the gradient of the constraints are parallel to each other at the point $(x_0,y_0)$ where the function attains its maximum or minimum. Why is that? Think about it for a minute.

So we can write the following important equality

$ \bbox[5px, border: 2px solid blue]{\nabla f(x_0,y_0)=\lambda \nabla g(x_0,y_0)}$

Important: The above relation only holds at the point $(x_0,y_0)$

The $\lambda$ is called Lagrange multiplier . This helps us to scale the magnitude of the gradient of the constraint function to match the magnitude of the gradient of the objective function. How do we make use of the equation to find $(x_0,y_0)$ ?

$ \nabla_f \begin{bmatrix} 2 \\ 1 \end{bmatrix}=\lambda \nabla_g \begin{bmatrix} 2x_0 \\ 2y_0 \end{bmatrix}$

Now we can solve for the $(x_0,y_0)$ by equating the gradients. You can take a look at the detailed calculations at Khan Academy

Now, we can conceptually extend the same for multiple constraints.

PREVIOUSMaximum Likelihood Estimation

NEXTEmergence of Large Language Models (LLMs)