Overview

Analysis of the Paper “Negative Data Augmentation” (ICLR 2021)

TLDR

img1

Summary

Very interesting paper about generative models

1. Introduction

A common failure mode for generative models, especially GAN, is they produce a significant amount of “false positives”: samples that are not realistic.

Why?

Using an analogy with physics, standard GANs are trained by enforcing a sort of attractive potential between the generator distribution and the true distribution samples, using the discriminator.

So we have

The training can be imagined as using the attractive potential of \(D(\cdot, \cdot)\) to steer \(P_{\theta}\) towards \(P_{\theta}\)

This method is effective in increasing the True Positives (generated samples that seem realistic) altough a common failure mode is this also increases the False Positives (generated samples that are not realistic)

2. Problem Formulation

The bad samples distributions have been called \(\{\bar P_{i}\}_{i=1,...,p}\) and are represented in Fig.3 as red circles: it is possible to see in the left image the overlapping area between the \(P_{\theta}\) and \(\{\bar P_{i}\}_{i=1,...,p}\) representing the generated bad samples

The goal is to minimize this overlapping area

3. Solutions

3.1 Add more positive samples

Building a better samples based representation of the underlying real data distribution \(P_{data}^{(n)}\) meaning increasing the \(n\) number of samples, leads to convergence \(\lim_{n \rightarrow \infty} P_{\theta} = P_{data}^{(n)}\)

Unfortunately the \(n \rightarrow \infty\) is called the Infinite Data Regime and in practice we are very far from it so this theoretical guarantee is of no practical use

3.2 Add negative samples

Here is the idea suggested by the paper

Relying only on positive samples is too samples inefficient, meaning too many samples are needed to get good results in practice, so the idea is to change the Discriminator Term so that it can accept also negative examples that are related to a repulsive potential

So the combination of good samples attractive potential and bad samples repulsive potential should improve the samples efficiency of the training process

Typically in GANs a sample \(x\) can come from 2 distributions

Here one more possibility is added

So recalling the convention about the binary classification problem of the Discriminator where 1=Real, 0=Fake then the energy terms are

Examples

nda_gan1

nda_gan2

Personal Comments

The 3.2 solution implies the following

This means if in general we have a function \(f(\phi, t; P_{data}^{(n)}, G_{\theta}, \bar P)\) that at each \(t\) training time outputs a sample from one of the 3 source distributions, making the decision as a function of \(\phi\) set of parameters, then this function can very well have a strong impact on the final performance


Also available on LinkedIn here