Home

Softmax and Its Derivative

Introduction The Softmax function is one of the most commonly used activation functions at the output layer of neural networks (be it CNN, RNN or Transformers). In fact, Large Language Models(LLMs) based on transformer architecture (like ChatGPT) use softmax in the output layer for many NLP(Natural Language Processing) tasks. Therefore, it is i...

Read more

Masked Attentions in Transformer Architectures

Introduction Masked attention is typically used in the Decoder part of (vanilla) transformer architecture to prevent the model from looking at future tokens. This left me with the impression that the models trained using Masked Language Modelling (MLM) objective use Masked Attention. However, masked language models like BERT (Bidirectional En...

Read more

Pytorch for Deep Learning

Theory is not enough, you must apply. The objective of this workshop is to give you hands on experience in building models using the pytorch's core component called Tensor . Beleive me, everything you are gonna build is simply stacking or connecting copies of this single core component!. It make sense as all the models takes in Tensors (data) a...

Read more

கணிப்போம் வா

"கண்ணா, வெளியில விளையாடப் போறயா, குடை எடுத்துட்டு போப்பா, மழை வர 60% வாய்ப்பிருக்குனு (Chance) போட்ருக்குப்பா ", என்றாள் திண்ணையில் ஸ்மார்ட் போனை ஸ்க்ரால் செய்தபடி அமர்ந்திருந்த பாட்டி. "அட போ பாட்டி, வெயில் கொளுத்துது" என்று அவள் சொன்னதை அசட்டை செய்து விட்டு சென்றான் பேரன் ஹரி . இவர்களது பேச்சை கேட்டவாறே வீட்டினுள் சென்றாள் பேத்தி மீனா . ...

Read more

Gradient as a Guide : A Simple Game

Gradient as a Guide : A Simple Game   The Backpropagation algorithm is the powerhouse of all Deep Learning models. It is one of the methods of efficiently calculating Gradient for billions of parameters. Therefore, it is vital to understand how the Gradient information g...

Read more

Bringing Python to Browser!

Jupyter notebook has always been a de-facto choice when you teach the Python programming language or while developing prototype machine learning models or doing exploratory data analysis. The reasons are manifold. The most important reason is due to its ability to interleave the rich set of explanatory notes using markdown cells and ...

Read more

Making Sense of Positional Encoding in Transformer

Motivation Are you wondering about the peculiar use of a sinusoidal function to encode the positional information in Transformer architecture? Are you asking why not just use simple one-hot encoding or something similar to encode positions?. Welcome, this article is for you. Perhaps you are here after reading a few articles explaining the posit...

Read more

Transformer Architecture Explained in Detail from the scratch

Initially, I thought of writing an article on transformer architecture. However, I realized that it would go beyond some 90 pages if I cover every detail of it. So I thought of putting the presentation (which is almost self-reliant) prepared by me for the deep learning course taught at IITM by Prof.Mitesh Khapra. Here is the presentation (wait ...

Read more