Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

前面几篇主要都是采用了风格迁移的思想,这篇论文主要就是讲这个:

abstract

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be available. We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples. Our goal is to learn a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss. Because this mapping is highly under-constrained, we cou- ple it with an inverse mapping F : Y → X and introduce a cycle consistency loss to enforce F(G(X)) ≈ X (and vice versa). Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc. Quantitative comparisons against several prior methods demonstrate the superiority of our approach.

unpaired

他们要解决的unpaired translation,换句话说就是没有,很多label的,如下图所示:

loss

感觉这篇论文最核心的思想就是这个cycle consistency loss,大意是,做translation,从一个句子,从英语翻译到法语,在从法语翻译过来,应该和原句一样才对,反之亦然。cycle consistency loss主要是采用这个思想:
$G: X \rightarrow Y, F: Y \rightarrow X, F(G(x)) \approx x, G(F(y))) \approx y$

首先我们在重温一下GANs,它最核心的一点是一个adversarial loss来使得生成的图片和真实的图片难以被区分。
结构如下:

然后我们在来看看这篇论文的loss,就会很清晰了,不就是是两个GAN吗,然后结合了这个cycle consistency loss

adversarial loss

$L_{GAN}(G, D_{Y}, X, Y) = E_{y \sim p_{data}(y)}[log D_{Y}(y)] + E_{x \sim p_{data}(x)}[log(1 - D_{Y}(G(x)))]$
$L_{GAN}(F, D_{X}, X, Y) = E_{x \sim p_{data}(x)}[log D_{X}(x)] + E_{y \sim p_{data}(y)}[log(1 - D_{X}(F(y)))]$

cycle consistency loss

$L_{cyc}(G, F) = E_{x \sim p_{data}(x)}[|F(G(x)) - x |] + E_{y \sim p_{data}(y)}[|G(F(y)) - y |]$

其实总的idea不是很难理解,看懂这个在看别的论文就会好很多,很多论文都是基于cycle-GAN来改的。