Introduction to Generative Models


Introduction

This is an introduction to the main settings encountered in generative modelling: the goal of this Lecture is to understand the relationship between vanilla unconditional generative modelling and industrial generative models such as DALL-E, Stable Diffusion, GPT, etc.

Unconditional Generative Modelling

What In unconditional generative modelling, we are given a set of unlabelled data

Data: {𝑥1,𝑥2,𝑥3,,𝑥𝑛}𝑛observations𝑑.
Cute cat 1
x1
Cute cat 2
x2
Cute cat 3
x3
Cute cat 4
x4
Cute cat 5
x5
Cute cat 5
x6
A dataset of cat photos (source, Pexels.com).

Assumption The core underlying assumption of generative modelling is that the data x1,,xnx_1, \dots, x_n, is drawn from some unknown underlying distribution pdatap_{\mathrm{data}}: for all i1,,ni \in 1, \dots, n

𝑥𝑖𝑝dataunknown.

Goal Using the empirical data distribution x1,,xnpdatax_1, \dots, x_n \sim p_{\mathrm{data}}, the goal is to generate new samples xnewx^{\text{new}} that look like they were drawn from the same unknown distribution pdatap_{\mathrm{data}}

𝑥new𝑝data.

Class-Conditional Generative Modelling

What In class-conditional generative modelling, we are given a set of labelled data

Data: {(𝑥1,𝑦1),,(𝑥𝑛,𝑦𝑛)}𝑛labelled observations𝑑×.
Cute cat 1
x1, y1=cat
Cute cat 2
x2, y2=cat
Cute cat 3
x3, y3=cat
Cute cat 4
x4, y4=dog
Cute cat 5
x5, y5=dog
Cute cat 5
x6, y6=dog
A labelled dataset of cat and dog photos (source, Pexels.com).

Assumption For class-conditional generative models, the assumption is that the data x1,,xnx_1, \dots, x_n, is drawn from some unknown underlying conditional probability distributions pdata(y=yi)p_{\mathrm{data}}( \cdot | y = y_i): for all i1,,ni \in 1, \dots, n

𝑥𝑖𝑝data(|𝑦=𝑦𝑖)unknown,𝑦𝑖{cat,dog}.

Goal Using the empirical data distributions (x1,y1),,(xn,yn)(x_1, y_1), \dots, (x_n, y_n) , the goal is to generate new samples xnewx^{\text{new}} that look like they were drawn from the same unknown distributions pdata(y)p_{\mathrm{data}}(\cdot | y). More precisely, we want to be able to generate new images of cats xnew catx^{\text{new cat}} and dogs xnew dogx^{\text{new dog}} that follow the conditional probability distributions

𝑥new cat𝑝data(|𝑦=cat),
𝑥new dog𝑝data(|𝑦=dog).

Remark i) To train class-conditional generative models, we could split the dataset into two parts, one with all the cat images and one with all the dog images, and train two separate unconditional generative models. However, this would not leverage similarities between the two classes: both cats and dogs have four legs, a tail, fur, etc. Class-conditional generative models can share information across classes.

Remark ii) Generative modelling is a very different task than standard supervised learning. The usual classification task is the following, given an empirical labelled data distribution (x1,y1),,(xn,yn)(x_1, y_1), \dots, (x_n, y_n), the goal is to estimate the probability a given new image xx is a cat or a dog, i.e. we want to estimate pdata(y=catx)p_{\mathrm{data}}(y = cat | x). On the opposite, in class-conditional generative modelling, we are given a class (e.g. cat), and we want to estimate the probability distribution of images of cats pdata(xy=cat)p_{\mathrm{data}}(x | y = cat), and sample new images from this distribution.

Text-Conditional Generative Modelling

What In text-conditional generative modelling, we are given a set of data (e.g. images) and their text description

Data: {(𝑥1,𝑦1),,(𝑥𝑛,𝑦𝑛)}𝑛images𝑥𝑖and their text description 𝑦𝑖.
Cute cat 1
x1, y1='A cat licking his hand'
Cute cat 2
x2, y2='A cat starring into the camera'
Cute cat 3
x3, y3='A cat yawning'
Cute cat 4
x4, y4='A dog running'
Cute cat 5
x5, y5='A dog sleeping'
Cute cat 5
x6, y6='A dog starring into the camera.'
A dataset of cat and dog photos, and their text description (source, Pexels.com).

For instance, Stable Diffusion was trained on the LAION-5B dataset, a dataset of 5 billion images and their textual description.

Assumption For text-conditional generative models, the assumption is that the data x1,,xnx_1, \dots, x_n, is drawn from some unknown underlying conditional probability distributions pdata(y=yi)p_{\mathrm{data}}( \cdot | y = y_i): for all i1,,ni \in 1, \dots, n

𝑥𝑖𝑝data(|𝑦=𝑦𝑖)unknown,𝑦𝑖is a text description.

The main difference with class-conditional is that the conditioning variable yiy_i is now a text description, not a fixed number of classes.

Goal Using the data and their text description (x1,y1),,(xn,yn)(x_1, y_1), \dots, (x_n, y_n) , the goal is to generate new samples xnewx^{\text{new}}, given a text description. More precisely, given a text description ynewy^{new} we want to be able to generate new images xnewx^{\text{new}} that follow the conditional probability distributions

𝑥new𝑝data(|𝑦=𝑦new),

Remark iii) Text-conditional generative modelling is very challenging regarding multiple aspects:

  • one usually observes only one sample xix_i per textual description yiy_i, i.e., one has to leverage similarities between text descriptions yiy_i to learn the conditional distributions pdata(y=yi)p_{\mathrm{data}}(\cdot | y=y_i).
  • one has to handle new text descriptions ynewy^{new} that were not seen during training, i.e., the model needs to be able to generalize to new text.
  • text descriptions are complex objects, that are not easy to handle (discrete objects with variable sequence length). Handling text conditioning requires a lot of engineering and is out of the scope of this introduction Lecture (tokenization, embeddings, transformers, etc.).

Remark iv) Even if text-conditional generative modelling is very challenging, the tools, algorithms, and concepts used for unconditional generative modelling are the same.

Other Applications of Generative Modelling

Scientific Discovery
Inverse Problems
Robotics

1 and 2-Dimensional Examples

References