DL-Pytorch-Workshop

Deep Learning with Pytorch and Hugging Face

  1. The factory: nn.Module, Data Utils
    • Difficulty Level: Medium
    • Brief tour into the source code of nn.Module
    • Everything is a module (layer in other frameworks)
    • Stack modules by subclassing nn.Module and build any neural network
    • Managing data with dataset class and DataLoader class
  2. Convolutional Neural Network Image Classification
    • Difficulty Level: Medium
    • Using torchvision for datasets
    • build CNN and move it to GPU
    • Train and test
    • Transfer learning
    • Image segmentation
  3. Recurrent Neural Network Sequence classification
    • Difficulty Level: Hard for pre-processing part, Medium for model building part
    • torchdata
    • torchtext
    • Embedding for words
    • Build RNN
    • Train,test, infer
  4. Using pre-trained models Notebook
    • Difficulty Level: Easy
    • AutoTokenizer
    • AutoModel
  5. Fine-Tuning Pre-Trained Models Notebook
    • Difficulty Level: Medium
    • datasets
    • tokenizer
    • data collator with padding
    • Trainer
  6. Loading Datasets Notebook
    • Difficulty Level: Easy
    • Dataset from local data files
    • Dataset from Hub
    • Preprocessing the dataset: Slice, Select, map, filter, flatten, interleave, concatenate
    • Loading from external links
  7. Build a Custom Tokenizer for translation task Notebook
    • Difficulty Level: Medium
    • Translation dataset as running example
    • Building the tokenizer by encapsulating the Normalizer, pre-tokenizer and tokenization algorithm (BPE)
    • Locally Save and Load the tokenizer
    • Using it in the Transformer module
    • Exercise: Build a Tokenizer with shared vocabulary.
  8. Training Custom Seq2Seq model using Vanilla Transformer Architecture Notebook
    • Difficulty Level: Medium, if you know how to build models in PyTorch.
    • Build Vanilla Transformer architecture in Pytorch
    • Create a configuration file for a model using PretrainedConfig class
    • Wrap it by HF PreTrainedModel class
    • Use the custom tokenizer built in the previous notebook
    • Use Trainer API to train the model
  9. Gradient Accumulation - Continual Pre-training Notebook
    • Difficulty Level: Easy
    • Understand the memory requirement for training and inference
    • Understand how gradient accumulation overcomes the limited memory
  10. Upload it to Hub [Notebook]
    • Under Preparation

UPDATE: Added Hugging Face Notebooks. If you plan to use Transformer Architecture for any task, HF is the way to go! The Transformer module is built on top of Pytorch (and Tensorflow) to do a lot of heavy lifting!