multi-level-GAN-code解读

接下来我开始解读,这篇论文的code

Network Structure and Training

Discriminator

For the discriminator, we use an architecture similar to but utilize all fully-convolutional lay- ers to retain the spatial information.
总是来说就是,5层卷积网络,kernel是4 x 4 stride是2,channel是{64, 128, 256, 512, 1}。
除了最后一层卷积层,其他所有层都是用参数为0.2的leaky ReLU。在最后一层卷积层之后加了一个up-sampling的层,使得最后一层和输入图片的大小是一样的。他们没有使用batch-normalization层,因为他们用小的batch size一起训练判别器和分割网络。(?)

  • batch normalization:

Segmentation Network

他们用DeepLab-v2 和 ResNet-101来作为他们分割的baseline,由于memory的问题,他们没有使用multi-scale。
他们去掉了最后的分类的一层,然后将最后的两层卷积stride从2改成1。这使得输出的feature maps是输入图片大小的1/8。为了使这个更大,他们在conv4和conv5用了stride分别是2,4的dilated conv。这后面又用了Atrous Spatial Pyramid Pooling (ASPP)作为最后的分类器。在ASPP后面,他们也采用了输出的是softmax的up-sampling层,这层输出的大小和输入的图片大小也是一样的。

  • ASPP:

Multi-level Adaptation Model

上面的构成了他们single-level的网络结构。为了构建multi-level的结构,他们将conv4的feature map和一个作为辅助分类器的ASPP模块相结合。和single-level类似,这里面也加了一个同样结构的判别器来进行对抗学习。如图:

Train

作者发现,将segmentation network和discriminitor一起训练效率会比较高。
对源域将图片$I_s$向前传最后得到$P_s$,以及优化$L_{seg}$。对于目标域,我们将得到的$P_t$和$P_s$一起输入到判别器里面,然后优化$L_{d}$。此外,对于$P_t$,我们还需要计算对抗损失$L_{ad}$。

Loss Function

  • whole objective:
    $L(I_s, I_t) = L_{seg}(I_s) + \lambda L_{adv}(I_t)$

    • $L_{seg}(I_s)$
      cross-entropy loss using ground truth annotations in the source domain
    • $L_{adv}$
      对抗损失,用来使得源域的预期的数据分布和目标域相近
    • $\lambda_{abv}$
      这个weight用来平衡这两个loss
  • discriminitor:

    • segmentation softmax output:
      $P = G(I) \in R^{HxWxC}$, 这里C是种类数,这里C是19

    • cross-entropy loss:
      我们将P传到全卷积的判别器D里面:$L_d(P) = - \sum_{h, w}((1 - z)log(D(P)^{(h,w,0)})) + zlog(D(P)^{(h,w,1)})$,这个是binary cross entropy,这里z = 0,表示来自target,z = 1表示来自source

  • segmentation network:

    • segmentation loss:
      在源域的话我们正常训练,还是由cross-entropy loss来定义:$L_{seg}(I_s) = -\sum_{h, w}\sum_{c \in C}Y_s^{h,w,c}log(P_s^{(h,w,c)})$
    • adversarial loss:
      在目标域,我们的对抗损失是:$L_{adv}(I_t) = -\sum_{h,w}log(D(G(I_t)))^{(h,w,1)}$,这个损失是用来欺骗判别器的,使得两者的预期的概率的一致
  • multi-level:

    • multi-level loss
      就是在low-level的feature space里面加上上面的loss,也不是很难理解:
      $L_{I_s, I_t} = \sum_i \lambda_i^{seg}L^i_{seg}(I_s) + \sum_i \lambda^i_{adv}L_{adv}^i(I_t)$,i表示第几层网络。

Network Code

Discriminitor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import torch.nn as nn
import torch.nn.functional as F
class FCDiscriminator(nn.Module):
def __init__(self, num_classes, ndf = 64):
super(FCDiscriminator, self).__init__()

self.conv1 = nn.Conv2d(num_classes, ndf, kernel_size=4, stride=2, padding=1)
self.conv2 = nn.Conv2d(ndf, ndf*2, kernel_size=4, stride=2, padding=1)
self.conv3 = nn.Conv2d(ndf*2, ndf*4, kernel_size=4, stride=2, padding=1)
self.conv4 = nn.Conv2d(ndf*4, ndf*8, kernel_size=4, stride=2, padding=1)
self.classifier = nn.Conv2d(ndf*8, 1, kernel_size=4, stride=2, padding=1)

self.leaky_relu = nn.LeakyReLU(negative_slope=0.2, inplace=True)
#self.up_sample = nn.Upsample(scale_factor=32, mode='bilinear')
#self.sigmoid = nn.Sigmoid()

def forward(self, x):
x = self.conv1(x)
x = self.leaky_relu(x)
x = self.conv2(x)
x = self.leaky_relu(x)
x = self.conv3(x)
x = self.leaky_relu(x)
x = self.conv4(x)
x = self.leaky_relu(x)
x = self.classifier(x)
#x = self.up_sample(x)
#x = self.sigmoid(x)

return x

有了上面的描述,判别器的网络还是很清楚的。

Segmentation Network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
class ResNetMulti(nn.Module):
def __init__(self, block, layers, num_classes):
self.inplanes = 64
super(ResNetMulti, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = nn.BatchNorm2d(64, affine=affine_par)
for i in self.bn1.parameters():
i.requires_grad = False
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1, ceil_mode=True) # change
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=1, dilation=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=1, dilation=4)
self.layer5 = self._make_pred_layer(Classifier_Module, 1024, [6, 12, 18, 24], [6, 12, 18, 24], num_classes)
self.layer6 = self._make_pred_layer(Classifier_Module, 2048, [6, 12, 18, 24], [6, 12, 18, 24], num_classes)

前面应该是对resnet的结构的继承吧,后面的layer5,和layer6应该就是前面说的ASPP的classifier了,这两个分别之后参与adaptation module的部分。

Train Code

Train G

类似train 原本的GAN,这里train的G其实就是segmentation network

Train with source

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
_, batch = trainloader_iter.next()
images, labels, _, _ = batch
images = Variable(images).cuda(args.gpu)

pred1, pred2 = model(images)
pred1 = interp(pred1)
pred2 = interp(pred2)

loss_seg1 = loss_calc(pred1, labels, args.gpu)
loss_seg2 = loss_calc(pred2, labels, args.gpu)
loss = loss_seg2 + args.lambda_seg * loss_seg1

# proper normalization
loss = loss / args.iter_size
loss.backward()
loss_seg_value1 += loss_seg1.data.cpu().numpy()[0] / args.iter_size
loss_seg_value2 += loss_seg2.data.cpu().numpy()[0] / args.iter_size

train with source这里的loss就是:$\sum_i \lambda_i^{seg}L^i_{seg}(I_s)$,$L_{seg}(I_s) = -\sum_{h, w}\sum_{c \in C}Y_s^{h,w,c}log(P_s^{(h,w,c)})$

Train with target

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
_, batch = targetloader_iter.next()
images, _, _ = batch
images = Variable(images).cuda(args.gpu)

pred_target1, pred_target2 = model(images)
pred_target1 = interp_target(pred_target1)
pred_target2 = interp_target(pred_target2)

D_out1 = model_D1(F.softmax(pred_target1))
D_out2 = model_D2(F.softmax(pred_target2))

loss_adv_target1 = bce_loss(D_out1,
Variable(torch.FloatTensor(D_out1.data.size()).fill_(source_label)).cuda(
args.gpu))

loss_adv_target2 = bce_loss(D_out2,
Variable(torch.FloatTensor(D_out2.data.size()).fill_(source_label)).cuda(
args.gpu))

loss = args.lambda_adv_target1 * loss_adv_target1 + args.lambda_adv_target2 * loss_adv_target2
loss = loss / args.iter_size
loss.backward()
loss_adv_target_value1 += loss_adv_target1.data.cpu().numpy()[0] / args.iter_size
loss_adv_target_value2 += loss_adv_target2.data.cpu().numpy()[0] / args.iter_size

train target对应的就是:$\sum_i \lambda^i_{adv}L_{adv}^i(I_t)$,$L_{adv}(I_t) = -\sum_{h,w}log(D(G(I_t)))^{(h,w,1)}$

Train D

Train with source

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# labels for adversarial training
source_label = 0

pred1 = pred1.detach()
pred2 = pred2.detach()

D_out1 = model_D1(F.softmax(pred1))
D_out2 = model_D2(F.softmax(pred2))

loss_D1 = bce_loss(D_out1,
Variable(torch.FloatTensor(D_out1.data.size()).fill_(source_label)).cuda(args.gpu))

loss_D2 = bce_loss(D_out2,
Variable(torch.FloatTensor(D_out2.data.size()).fill_(source_label)).cuda(args.gpu))

loss_D1 = loss_D1 / args.iter_size / 2
loss_D2 = loss_D2 / args.iter_size / 2

loss_D1.backward()
loss_D2.backward()

loss_D_value1 += loss_D1.data.cpu().numpy()
loss_D_value2 += loss_D2.data.cpu().numpy()

$L_d(P) = - \sum_{h, w}((1 - z)log(D(P)^{(h,w,0)})) + zlog(D(P)^{(h,w,1)})$,z = 0

Train with target

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# labels for adversarial training
target_label = 1

pred_target1 = pred_target1.detach()
pred_target2 = pred_target2.detach()

D_out1 = model_D1(F.softmax(pred_target1))
D_out2 = model_D2(F.softmax(pred_target2))

loss_D1 = bce_loss(D_out1,
Variable(torch.FloatTensor(D_out1.data.size()).fill_(target_label)).cuda(args.gpu))

loss_D2 = bce_loss(D_out2,
Variable(torch.FloatTensor(D_out2.data.size()).fill_(target_label)).cuda(args.gpu))

loss_D1 = loss_D1 / args.iter_size / 2
loss_D2 = loss_D2 / args.iter_size / 2

loss_D1.backward()
loss_D2.backward()

loss_D_value1 += loss_D1.data.cpu().numpy()
loss_D_value2 += loss_D2.data.cpu().numpy()

$L_d(P) = - \sum_{h, w}((1 - z)log(D(P)^{(h,w,0)})) + zlog(D(P)^{(h,w,1)})$,z = 1