Transfer Learning
Imagine now that we are just beggining to learn to ride a bicycle. Initially, we need to put a lot of effort into learning to balance the cycle. Then gradually
we learn to balance and eventually take a full control over the ride. Suppose that we also wish to learn to ride a simple electric bike.
How difficult will that be for us?. It will be bit...
Softmax and Its Derivative
Introduction
The Softmax function is one of the most commonly used activation functions at the output layer of neural networks (be it CNN, RNN or Transformers). In fact, Large Language Models(LLMs) based on transformer architecture (like ChatGPT) use softmax in the output layer for many NLP(Natural Language Processing) tasks. Therefore, it is i...
Masked Attentions in Transformer Architectures
Introduction
Masked attention is typically used in the Decoder part of (vanilla) transformer architecture to prevent the model from looking at future tokens. This left me with the impression that the models trained using Masked Language Modelling (MLM) objective use Masked Attention.
However, masked language models like BERT (Bidirectional En...
Pytorch for Deep Learning
Theory is not enough, you must apply. The objective of this workshop is to give you hands on experience in building models using the pytorch's core component called Tensor . Beleive me, everything you are gonna build is simply stacking or connecting copies of this single core component!. It make sense as all the models takes in Tensors (data) a...
கணிப்போம் வா
"கண்ணா, வெளியில விளையாடப் போறயா, குடை எடுத்துட்டு போப்பா, மழை வர 60% வாய்ப்பிருக்குனு (Chance) போட்ருக்குப்பா ", என்றாள் திண்ணையில் ஸ்மார்ட் போனை ஸ்க்ரால் செய்தபடி அமர்ந்திருந்த பாட்டி.
"அட போ பாட்டி, வெயில் கொளுத்துது" என்று அவள் சொன்னதை அசட்டை செய்து விட்டு சென்றான் பேரன் ஹரி .
இவர்களது பேச்சை கேட்டவாறே வீட்டினுள் சென்றாள் பேத்தி மீனா .
...
Gradient as a Guide : A Simple Game
Gradient as a Guide : A Simple Game
The Backpropagation algorithm is the powerhouse of all Deep Learning models. It is one of the methods of efficiently calculating
Gradient for billions of parameters. Therefore, it is vital to understand how the Gradient information g...
Bringing Python to Browser!
Jupyter notebook has always been a de-facto choice when you teach the Python programming language or while developing prototype machine learning models or doing exploratory data analysis.
The reasons are manifold. The most important reason is due to its ability to interleave the rich set of explanatory notes using markdown
cells and ...
Making Sense of Positional Encoding in Transformer
Motivation
Are you wondering about the peculiar use of a sinusoidal function to encode the positional information in Transformer architecture? Are you asking why not just use simple one-hot encoding or something similar to encode positions?. Welcome, this article is for you.
Perhaps you are here after reading a few articles explaining the posit...
30 post articles, 4 pages.