Deep Learning with Pytorch and Hugging Face
- You can access the slide deck that covers Pytorch Here
- You can access the slide deck that covers various concepts related to Transformers Here
- It is recommended to read the slide decks before using the following colab notebooks
- Once you get a good grip on the first four modules, you can easily walk through the documentation or other code to build an application. I will keep updating this repository.
- Recorded videos
## Colab Notebooks
- The Fuel: Tensors
- Difficulty Level: Easy if you have prior experience using Numpy or TensorFlow
- Understand the Pytorch architecture
- Create Tensors of 0d,1d,2d,3d,… (a multidimensional array in numpy)
- Understand the attributes:
storage, stride, offset, device
- Manipulate tensor dimensions
- Operations on tensors
2. The Engine: Autograd
- Difficulty Level: Hard, requires a good understanding of backprop algorithm. However, you can skip this and still follow the subsequent notebooks easily.
- A few more attributes of tensor :
requires_grad, grad, grad_fn, _saved_tensors, backward, retain_grad, zero_grad
- Computation graph: Leaf node (parameters) vs non-leaf node (intermediate computation)
- Accumulate gradient and update with context manager (torch.no_grad)
- Implementing a neural network from scratch
- The factory: nn.Module, Data Utils
- Difficulty Level: Medium
- Brief tour into the source code of nn.Module
- Everything is a module (layer in other frameworks)
- Stack modules by subclassing nn.Module and build any neural network
- Managing data with
dataset
class and DataLoader
class
- Convolutional Neural Network Image Classification
- Recurrent Neural Network Sequence classification
- Difficulty Level: Hard for pre-processing part, Medium for model building part
- torchdata
- torchtext
- Embedding for words
- Build RNN
-
Train,test, infer
Please take a look at the official tutorial series if you want to perform distributed training using a multi-GPU or multi-node setup in PyTorch (requires minimal modifications to the existing code). It covers various approaches, including:
- Distributed Data-Parallel (DDP)
- Fully Sharded Data Parallel (FSDP)
- Model, Tenosr and PipeLine parallelism
Now, let’s move on to the Hugging Face library, which further simplifies these training strategies
—
- Using pre-trained models Notebook
- Difficulty Level: Easy
- AutoTokenizer
- AutoModel
- Fine-Tuning Pre-Trained Models Notebook
- Difficulty Level: Medium
- datasets
- tokenizer
- data collator with padding
- Trainer
- Loading Datasets Notebook
- Difficulty Level: Easy
- Dataset from local data files
- Dataset from Hub
- Preprocessing the dataset: Slice, Select, map, filter, flatten, interleave, concatenate
- Loading from external links
- Build a Custom Tokenizer for translation task Notebook
- Difficulty Level: Medium
- Translation dataset as running example
- Building the tokenizer by encapsulating the Normalizer, pre-tokenizer and tokenization algorithm (BPE)
- Locally Save and Load the tokenizer
- Using it in the Transformer module
- Exercise: Build a Tokenizer with shared vocabulary.
- Training Custom Seq2Seq model using Vanilla Transformer Architecture Notebook
- Difficulty Level: Medium, if you know how to build models in PyTorch.
- Build Vanilla Transformer architecture in Pytorch
- Create a configuration file for a model using PretrainedConfig class
- Wrap it by HF PreTrainedModel class
- Use the custom tokenizer built in the previous notebook
- Use Trainer API to train the model
- Gradient Accumulation - Continual Pre-training Notebook
- Difficulty Level: Easy
- Understand the memory requirement for training and inference
- Understand how gradient accumulation overcomes the limited memory
CUDA Resources
—
Pytorch updated Cuda Semantics page on Aug 07 2025. If you are using Multiple GPUs, you must read it before starting to write code. Don’t assume!