Home

Data Pipeline for Large Language models

Data is the fuel for any machine learning model despite the type of learning algorithm (Gradient-based or tree-based) being used. To train and test the model’s generalization capacity, typically, we divide the available samples into three sets: Training, Validation and Test. The typical requirement is that the samples in the test set should be d...

Read more

Experimental Settings of Famous Language Models

GPT (Generative Pre-trained Transformer) Pre-training Dataset: Book corpus (0.8 Billion words) Unsupervised objective: CLM (Autoregressive) Tokenizer: Byte Pair Encoding (BPE) Vocab size: 40K Architecture: Decoder only (12 Layers) Activation: GELU Attention: Dense FFN: Dense Attention mask: Causal Mask Positional ...

Read more

Emergence of Large Language Models (LLMs)

Motivation Usually, in traditional machine learning, we use numerous approaches (model selection) like $K-$fold cross-validation and grid search to find the best model that generalizes well in the real world. However, when it comes to deep learning, it is quite challenging due to compute-cost constraints. It holds for neural language models too....

Read more

Lagrange Multiplier : Intuition via Interaction

Introduction I guess you end up being here after coming across the term “constrained optimization” or “Lagrangian” and wanted to understand what “Lagrange multiplier is?”. Well, in this post, I help you understand the foundation of it with interactive plots (you can find plenty of mathematical reasoning on the net). Let’s get straight to the poi...

Read more

Maximum Likelihood Estimation

Introduction The concept of estimation of an unknown quantity from the given observations has been a fascinating area of study for many centuries. However, it remains elusive for many beginners. Let’s start with a concept that we are already familiar with. Here is a sequence $x_1=[1,3,5,7,9,11, \times,\cdots,]$. What could be the value of the se...

Read more

Representation Learning

Motivation Let’s start with a simple question. How do you represent a number on a real line?. That’s straightforward. How do you represent a 2D point $\begin{bmatrix}x \ y \end{bmatrix}$ in a coordinate system?. Well, we use two real lines that are orthogonal to each other. Move $x$ unit on the axis and $y$ unit on the $y$ axis, then the locatio...

Read more

Running Jupyter Notebook from a Remote Server

Introduction If you are training a deep learning model or fine-tuning LLMs (Large Language Models), then at some point in time you need to connect with a remote machine that has required amount of computing power (about 80GB or 320 GB of GPU Memory). Data scientists often use Jupyter Notebooks for experimentation. Jupyter Notebook is designed t...

Read more

Interplay of Indices in Math - Illustrated

Introduction When we have a sequence of elements, we use an index to locate a particular element in the sequence. The sequence of elements can be arranged into 2D,3D or n-dimensional arrays. Therefore, it is extremely important to make ourselves comfortable with using indexes to manipulate such arrays. Usually, the index starts from either 0 or ...

Read more