labml.ai Deep Learning Paper Implementations
This is a collection of simple PyTorch implementations of
neural networks and related algorithms.
These implementations are documented with explanations,
The website
renders these as side-by-side formatted notes.
We believe these would help you understand these algorithms better.
We are actively maintaining this repo and adding new
implementations almost weekly.
for updates.
Paper Implementations
Transformers
- Multi-headed attention
- Transformer building blocks
- Transformer XL
- Relative multi-headed attention
- Rotary Positional Embeddings
- Attention with Linear Biases (ALiBi)
- RETRO
- Compressive Transformer
- GPT Architecture
- GLU Variants
- kNN-LM: Generalization through Memorization
- Feedback Transformer
- Switch Transformer
- Fast Weights Transformer
- FNet
- Attention Free Transformer
- Masked Language Model
- MLP-Mixer: An all-MLP Architecture for Vision
- Pay Attention to MLPs (gMLP)
- Vision Transformer (ViT)
- Primer EZ
- Hourglass
Low-Rank Adaptation (LoRA)
Eleuther GPT-NeoX
- Generate on a 48GB GPU
- Finetune on two 48GB GPUs
- LLM.int8()
Diffusion models
- Denoising Diffusion Probabilistic Models (DDPM)
- Denoising Diffusion Implicit Models (DDIM)
- Latent Diffusion Models
- Stable Diffusion
Generative Adversarial Networks
- Original GAN
- GAN with deep convolutional network
- Cycle GAN
- Wasserstein GAN
- Wasserstein GAN with Gradient Penalty
- StyleGAN 2
Recurrent Highway Networks
LSTM
HyperNetworks – HyperLSTM
ResNet
ConvMixer
Capsule Networks
U-Net
Sketch RNN
Graph Neural Networks
- Graph Attention Networks (GAT)
- Graph Attention Networks v2 (GATv2)
Counterfactual Regret Minimization (CFR)
Solving games with incomplete information such as poker with CFR.
- Kuhn Poker
Reinforcement Learning
- Proximal Policy Optimization with
Generalized Advantage Estimation - Deep Q Networks with
with Dueling Network,
Prioritized Replay
and Double Q Network.
Optimizers
- Adam
- AMSGrad
- Adam Optimizer with warmup
- Noam Optimizer
- Rectified Adam Optimizer
- AdaBelief Optimizer
- Sophia-G Optimizer
Normalization Layers
- Batch Normalization
- Layer Normalization
- Instance Normalization
- Group Normalization
- Weight Standardization
- Batch-Channel Normalization
- DeepNorm
Distillation
Adaptive Computation
- PonderNet
Uncertainty
- Evidential Deep Learning to Quantify Classification Uncertainty
Activations
- Fuzzy Tiling Activations
Langauge Model Sampling Techniques
- Greedy Sampling
- Temperature Sampling
- Top-k Sampling
- Nucleus Sampling
Scalable Training/Inference
- Zero3 memory optimizations
Installation
pip install labml-nn
