Transformer Architecture Explained in Detail from the scratch

 

Initially, I thought of writing an article on transformer architecture. However, I realized that it would go beyond some 90 pages if I cover every detail of it. So I thought of putting the presentation (which is almost self-reliant) prepared by me for the deep learning course taught at IITM by Prof.Mitesh Khapra. Here is the presentation (wait for a few seconds to load) that teaches you the details of the transformer architecture step by step. The presentation starts by reviewing the RNN based sequence to sequence models and the attention mechanism applied in Machine translation tasks.Then, we gradually transit to the transformer architecture.