DL-Pytorch-Workshop

Deep Learning with Pytorch and Hugging Face

You can access the slide deck that covers Pytorch Here
You can access the slide deck that covers various concepts related to Transformers Here
It is recommended to read the slide decks before using the following colab notebooks
Once you get a good grip on the first four modules, you can easily walk through the documentation or other code to build an application. I will keep updating this repository.
Recorded videos
Some recommendations
- Always start with any one of the following optimizers (priority in order): AdamW/Adam/SGD.
- PyTorch implements the optimizers in three ways: Forloop (slow), foreach (fast) and fused (faster)
- AdamW/Adam/SGD are implemented using a fused kernel and are also stable (extremely important) - Doc

## Colab Notebooks

The Fuel: Tensors
- Difficulty Level: Easy if you have prior experience using Numpy or TensorFlow
- Understand the Pytorch architecture
- Create Tensors of 0d,1d,2d,3d,… (a multidimensional array in numpy)
- Understand the attributes: storage, stride, offset, device
- Manipulate tensor dimensions
- Operations on tensors
The Engine: Autograd
- Difficulty Level: Hard, requires a good understanding of backprop algorithm. However, you can skip this and still follow the subsequent notebooks easily.
- A few more attributes of tensor : requires_grad, grad, grad_fn, _saved_tensors, backward, retain_grad, zero_grad
- Computation graph: Leaf node (parameters) vs non-leaf node (intermediate computation)
- Accumulate gradient and update with context manager (torch.no_grad)
- Implementing a neural network from scratch
The factory: nn.Module, Data Utils
- Difficulty Level: Medium
- Brief tour into the source code of nn.Module
- Everything is a module (layer in other frameworks)
- Stack modules by subclassing nn.Module and build any neural network
- Managing data with dataset class and DataLoader class
Convolutional Neural Network Image Classification
- Difficulty Level: Medium
- Using torchvision for datasets
- build CNN and move it to GPU
- Train and test
- Transfer learning
- Image segmentation
  
  Update - You can use various learning rate schedulers such as ExponentialLR, CosineAnnealing and so on. You just need to call scheduler.step() after optimizer.step. Refer to the documentation here - A slight change in instantiating pre-trained models Refer —
Recurrent Neural Network Sequence classification
- Difficulty Level: Hard for pre-processing part, Medium for model building part
- torchdata
- torchtext
- Embedding for words
- Build RNN
- Train,test, infer
  
  Please take a look at the official tutorial series if you want to perform distributed training using a multi-GPU or multi-node setup in PyTorch (requires minimal modifications to the existing code). It covers various approaches, including: - Distributed Data-Parallel (DDP) single-node/multi-node - Fully Sharded Data Parallel (FSDP) - Model, Tenosr and PipeLine parallelism
  Now, let’s move on to the Hugging Face library, which further simplifies these training strategies —
Using pre-trained models Notebook
- Difficulty Level: Easy
- AutoTokenizer
- AutoModel
Fine-Tuning Pre-Trained Models Notebook
- Difficulty Level: Medium
- datasets
- tokenizer
- data collator with padding
- Trainer
Loading Datasets Notebook
- Difficulty Level: Easy
- Dataset from local data files
- Dataset from Hub
- Preprocessing the dataset: Slice, Select, map, filter, flatten, interleave, concatenate
- Loading from external links
Build a Custom Tokenizer for translation task Notebook
- Difficulty Level: Medium
- Translation dataset as running example
- Building the tokenizer by encapsulating the Normalizer, pre-tokenizer and tokenization algorithm (BPE)
- Locally Save and Load the tokenizer
- Using it in the Transformer module
- Exercise: Build a Tokenizer with shared vocabulary.
Training Custom Seq2Seq model using Vanilla Transformer Architecture Notebook
- Difficulty Level: Medium, if you know how to build models in PyTorch.
- Build Vanilla Transformer architecture in Pytorch
- Create a configuration file for a model using PretrainedConfig class
- Wrap it by HF PreTrainedModel class
- Use the custom tokenizer built in the previous notebook
- Use Trainer API to train the model
Gradient Accumulation - Continual Pre-training Notebook
- Difficulty Level: Easy
- Understand the memory requirement for training and inference
- Understand how gradient accumulation overcomes the limited memory

CUDA Resources — Pytorch updated Cuda Semantics page on Aug 07 2025. If you are using Multiple GPUs, you must read it before starting to write code. Don’t assume!

DL-Pytorch-Workshop

Deep Learning with Pytorch and Hugging Face

Image segmentation

Train,test, infer