storage, stride, offset, device
requires_grad, grad, grad_fn, _saved_tensors, backward, retain_grad, zero_grad
dataset
class and DataLoader
classPlease take a look at the official tutorial series if you want to perform distributed training using a multi-GPU or multi-node setup in PyTorch (requires minimal modifications to the existing code). It covers various approaches, including:
- Distributed Data-Parallel (DDP)
- Fully Sharded Data Parallel (FSDP)
- Model, Tenosr and PipeLine parallelism
Now, let’s move on to the Hugging Face library, which further simplifies these training strategies
—