Lagrange Multiplier : Intuition via Interaction
Introduction
I guess you end up being here after coming across the term “constrained optimization” or “Lagrangian” and wanted to understand what “Lagrange multiplier is?”. Well, in this post, I help you understand the foundation of it with interactive plots (you can find plenty of mathematical reasoning on the net). Let’s get straight to the poi...
Maximum Likelihood Estimation
Introduction
The concept of estimation of an unknown quantity from the given observations has been a fascinating area of study for many centuries. However, it remains elusive for many beginners. Let’s start with a concept that we are already familiar with. Here is a sequence $x_1=[1,3,5,7,9,11, \times,\cdots,]$. What could be the value of the se...
Representation Learning
Motivation
Let’s start with a simple question. How do you represent a number on a real line?. That’s straightforward. How do you represent a 2D point $\begin{bmatrix}x \ y \end{bmatrix}$ in a coordinate system?. Well, we use two real lines that are orthogonal to each other. Move $x$ unit on the axis and $y$ unit on the $y$ axis, then the locatio...
Running Jupyter Notebook from a Remote Server
Introduction
If you are training a deep learning model or fine-tuning LLMs (Large Language Models), then at some point in time you need to connect with a remote machine that has required amount of computing power (about 80GB or 320 GB of GPU Memory). Data scientists often use Jupyter Notebooks for experimentation. Jupyter Notebook is designed t...
Interplay of Indices in Math - Illustrated
Introduction
When we have a sequence of elements, we use an index to locate a particular element in the sequence. The sequence of elements can be arranged into 2D,3D or n-dimensional arrays. Therefore, it is extremely important to make ourselves comfortable with using indexes to manipulate such arrays. Usually, the index starts from either 0 or ...
Transfer Learning
Imagine now that we are just beggining to learn to ride a bicycle. Initially, we need to put a lot of effort into learning to balance the cycle. Then gradually
we learn to balance and eventually take a full control over the ride. Suppose that we also wish to learn to ride a simple electric bike.
How difficult will that be for us?. It will be bit...
Softmax and Its Derivative
Introduction
The Softmax function is one of the most commonly used activation functions at the output layer of neural networks (be it CNN, RNN or Transformers). In fact, Large Language Models(LLMs) based on transformer architecture (like ChatGPT) use softmax in the output layer for many NLP(Natural Language Processing) tasks. Therefore, it is i...
Masked Attentions in Transformer Architectures
Introduction
Masked attention is typically used in the Decoder part of (vanilla) transformer architecture to prevent the model from looking at future tokens. This left me with the impression that the models trained using Masked Language Modelling (MLM) objective use Masked Attention.
However, masked language models like BERT (Bidirectional En...
27 post articles, 4 pages.