Sep 11, 2025

Introduction to Generative Models

Introduction

This is an introduction to the main settings encountered in generative modelling: the goal of this Lecture is to understand the relationship between vanilla unconditional generative modelling and industrial generative models such as DALL-E, Stable Diffusion, GPT, etc.

Unconditional Generative Modelling

What In unconditional generative modelling, we are given a set of unlabelled data

Assumption The core underlying assumption of generative modelling is that the data $x_1, \dots, x_n$ , is drawn from some unknown underlying distribution $p_{\mathrm{data}}$ : for all $i \in 1, \dots, n$

Goal Using the empirical data distribution $x_1, \dots, x_n \sim p_{\mathrm{data}}$ , the goal is to generate new samples $x^{\text{new}}$ that look like they were drawn from the same unknown distribution $p_{\mathrm{data}}$

Class-Conditional Generative Modelling

What In class-conditional generative modelling, we are given a set of labelled data

Assumption For class-conditional generative models, the assumption is that the data $x_1, \dots, x_n$ , is drawn from some unknown underlying conditional probability distributions $p_{\mathrm{data}}( \cdot | y = y_i)$ : for all $i \in 1, \dots, n$

Goal Using the empirical data distributions $(x_1, y_1), \dots, (x_n, y_n)$ , the goal is to generate new samples $x^{\text{new}}$ that look like they were drawn from the same unknown distributions $p_{\mathrm{data}}(\cdot | y)$ . More precisely, we want to be able to generate new images of cats $x^{\text{new cat}}$ and dogs $x^{\text{new dog}}$ that follow the conditional probability distributions

Remark i) To train class-conditional generative models, we could split the dataset into two parts, one with all the cat images and one with all the dog images, and train two separate unconditional generative models. However, this would not leverage similarities between the two classes: both cats and dogs have four legs, a tail, fur, etc. Class-conditional generative models can share information across classes.

Remark ii) Generative modelling is a very different task than standard supervised learning. The usual classification task is the following, given an empirical labelled data distribution $(x_1, y_1), \dots, (x_n, y_n)$ , the goal is to estimate the probability a given new image $x$ is a cat or a dog, i.e. we want to estimate $p_{\mathrm{data}}(y = cat | x)$ . On the opposite, in class-conditional generative modelling, we are given a class (e.g. cat), and we want to estimate the probability distribution of images of cats $p_{\mathrm{data}}(x | y = cat)$ , and sample new images from this distribution.

Text-Conditional Generative Modelling

What In text-conditional generative modelling, we are given a set of data (e.g. images) and their text description

For instance, Stable Diffusion was trained on the LAION-5B dataset, a dataset of 5 billion images and their textual description.

Assumption For text-conditional generative models, the assumption is that the data $x_1, \dots, x_n$ , is drawn from some unknown underlying conditional probability distributions $p_{\mathrm{data}}( \cdot | y = y_i)$ : for all $i \in 1, \dots, n$

The main difference with class-conditional is that the conditioning variable $y_i$ is now a text description, not a fixed number of classes.

Goal Using the data and their text description $(x_1, y_1), \dots, (x_n, y_n)$ , the goal is to generate new samples $x^{\text{new}}$ , given a text description. More precisely, given a text description $y^{new}$ we want to be able to generate new images $x^{\text{new}}$ that follow the conditional probability distributions

Remark iii) Text-conditional generative modelling is very challenging regarding multiple aspects:

one usually observes only one sample $x_i$ per textual description $y_i$ , i.e., one has to leverage similarities between text descriptions $y_i$ to learn the conditional distributions $p_{\mathrm{data}}(\cdot | y=y_i)$ .
one has to handle new text descriptions $y^{new}$ that were not seen during training, i.e., the model needs to be able to generalize to new text.
text descriptions are complex objects, that are not easy to handle (discrete objects with variable sequence length). Handling text conditioning requires a lot of engineering and is out of the scope of this introduction Lecture (tokenization, embeddings, transformers, etc.).

Remark iv) Even if text-conditional generative modelling is very challenging, the tools, algorithms, and concepts used for unconditional generative modelling are the same.

Introduction to Generative Models

Introduction

Unconditional Generative Modelling

Class-Conditional Generative Modelling

Text-Conditional Generative Modelling

Other Applications of Generative Modelling

Scientific Discovery

Inverse Problems

Robotics

1 and 2-Dimensional Examples

References