vae-diffusion

VAE and Diffusion Models: A Step-by-Step Guide

This repository contains educational Jupyter notebooks that explore the theory and implementation of generative models, specifically Variational Autoencoders (VAEs) and Diffusion Models. These notebooks are designed to provide a clear, step-by-step understanding of how these powerful generative AI techniques work.

📖 Blog Post: For a gist and overview of the concepts covered in this repository, check out the accompanying blog post: From VAEs to Diffusion Models: A Step-by-Step Journey

🚀 Installation

To run the notebooks in this repository, you’ll need Python 3.8+ and the following dependencies:

pip install 'torch>=2.0.0' 'torchvision>=0.15.0' 'matplotlib>=3.5.0' 'tqdm>=4.64.0'

The main dependencies are:

PyTorch (for deep learning models)
torchvision (for computer vision utilities)
matplotlib (for visualization)
tqdm (for progress bars)

📚 What You’ll Learn

The mathematical foundations behind VAEs and diffusion models
How to implement these models from scratch using PyTorch
The connection between VAEs and diffusion models
How to generate MNIST digits using these techniques

🗂️ Notebook Overview

`01.vae.ipynb` – Variational Autoencoders

This notebook covers the fundamental concepts of Variational Autoencoders (VAEs):

Motivation behind generative models and latent variable models
The Evidence Lower Bound (ELBO) objective
Reparameterization trick for training VAEs
Implementation of VAEs using both Multi-Layer Perceptron (MLP) and Convolutional Neural Network (CNN) architectures
Exploration of both Bernoulli and Gaussian likelihoods for the decoder
Comparison of different likelihood models and architectures, and their effects on image generation quality for MNIST digits

`02.vae-without-encoder.ipynb` – Simplified VAE (Decoder-Only)

This notebook explores a “VAE-like” model that omits the encoder:

One-step fixed corruption process as “inference”
Connection between this simplified model and diffusion models
Implementation of a noise prediction objective
Demonstration of how this approach performs on MNIST data
Compare the classical CNN architecture with the U-Net CNN architecture

`03.diffusion.ipynb` – Diffusion Models

This comprehensive notebook dives into diffusion models:

Forward and reverse processes in diffusion models
Beta schedules (linear)
Variational inference and the ELBO objective for diffusion
Simplified noise prediction objective
Implementation of a diffusion model with:
- UNet architecture
- Time embedding and conditioning
- Linear beta schedule

Conditional generation, e.g., class, text, with VAE and diffusion.
VAE and diffusion working together, e.g., VAE as an encoder diffusion as a decoder.
Discrete modality, e.g., table, graph, text, etc, with non Gaussian and continuous distributions, e.g., discrete categorical.

This site is open source. Improve this page.

vae-diffusion

VAE and Diffusion Models: A Step-by-Step Guide

🚀 Installation

📚 What You’ll Learn

🗂️ Notebook Overview

01.vae.ipynb – Variational Autoencoders

02.vae-without-encoder.ipynb – Simplified VAE (Decoder-Only)

03.diffusion.ipynb – Diffusion Models

Next

`01.vae.ipynb` – Variational Autoencoders

`02.vae-without-encoder.ipynb` – Simplified VAE (Decoder-Only)

`03.diffusion.ipynb` – Diffusion Models