PGD(projected gradient descent)算法源码解析

论文链接:https://arxiv.org/abs/1706.06083 源码出处:https://github.com/Harry24k/adversarial-attacks-pytorch/tree/master

PGDLinf源码

import torch

import torch.nn as nn

from ..attack import Attack

class PGD(Attack):

r"""

PGD in the paper 'Towards Deep Learning Models Resistant to Adversarial Attacks'

[https://arxiv.org/abs/1706.06083]

Distance Measure : Linf

Arguments:

model (nn.Module): model to attack.

eps (float): maximum perturbation. (Default: 8/255)

alpha (float): step size. (Default: 2/255)

steps (int): number of steps. (Default: 10)

random_start (bool): using random initialization of delta. (Default: True)

Shape:

- images: :math:`(N, C, H, W)` where `N = number of batches`, `C = number of channels`, `H = height` and `W = width`. It must have a range [0, 1].

- labels: :math:`(N)` where each value :math:`y_i` is :math:`0 \leq y_i \leq` `number of labels`.

- output: :math:`(N, C, H, W)`.

Examples::

>>> attack = torchattacks.PGD(model, eps=8/255, alpha=1/255, steps=10, random_start=True)

>>> adv_images = attack(images, labels)

"""

def __init__(self, model, eps=8/255,

alpha=2/255, steps=10, random_start=True):

super().__init__("PGD", model)

self.eps = eps

self.alpha = alpha

self.steps = steps

self.random_start = random_start

self.supported_mode = ['default', 'targeted']

def forward(self, images, labels):

r"""

Overridden.

"""

self._check_inputs(images)

images = images.clone().detach().to(self.device)

labels = labels.clone().detach().to(self.device)

if self.targeted:

target_labels = self.get_target_label(images, labels)

loss = nn.CrossEntropyLoss()

adv_images = images.clone().detach()

if self.random_start:

# Starting at a uniformly random point

adv_images = adv_images + torch.empty_like(adv_images).uniform_(-self.eps, self.eps)

adv_images = torch.clamp(adv_images, min=0, max=1).detach()

for _ in range(self.steps):

adv_images.requires_grad = True

outputs = self.get_logits(adv_images)

# Calculate loss

if self.targeted:

cost = -loss(outputs, target_labels)

else:

cost = loss(outputs, labels)

# Update adversarial images

grad = torch.autograd.grad(cost, adv_images,

retain_graph=False, create_graph=False)[0]

adv_images = adv_images.detach() + self.alpha*grad.sign()

delta = torch.clamp(adv_images - images, min=-self.eps, max=self.eps)

adv_images = torch.clamp(images + delta, min=0, max=1).detach()

return adv_images

解析

PGD算法(projected gradient descent)是在BIM算法的基础上的小改进,二者非常相近,BIM算法的源码解析在上一篇博客中,建议先看上一篇博客理解BIM算法的原理。

具体来说,在BIM算法开始迭代前,就先给图像加上扰动(在

ϵ

\epsilon

ϵ邻域内均匀分布)。换句话说,也就是图像开始迭代的起点随机,而不是像BIM算法一样从原始图像开始迭代。论文这么做的目的是为了研究从随机的起点开始迭代扰动,损失能够达到的不同的局部最大值的关系。

PGD算法的公式如下所示:

X

0

a

d

v

=

X

+

η

,

X

N

+

1

a

d

v

=

C

l

i

p

X

,

ϵ

{

X

N

a

d

v

+

α

s

i

g

n

(

x

J

(

X

N

a

d

v

,

y

t

r

u

e

)

)

}

X^{adv}_0=X+\eta,X^{adv}_{N+1}=Clip_{X,\epsilon}\{X^{adv}_N+\alpha sign(\triangledown_{x}J(X^{adv}_N,y_{true}))\}

X0adv​=X+η,XN+1adv​=ClipX,ϵ​{XNadv​+αsign(▽x​J(XNadv​,ytrue​))}其中,

η

\eta

η是一个随机扰动,在

ϵ

\epsilon

ϵ邻域内均匀分布。

eps:即

ϵ

\epsilon

ϵ,表示最大扰动。 alpha:即

α

\alpha

α,表示每次迭代中扰动的增加量(或减少量)。 steps:表示迭代次数。 random_start:迭代的起点是否随机,也就是是否要加随机扰动

η

\eta

η,若为False,则该算法就和BIM算法相同。 images = images.clone().detach().to(self.device):clone()将图像克隆到一块新的内存区(pytorch默认同样的tensor共享一块内存区);detach()是将克隆的新的tensor从当前计算图中分离下来,作为叶节点,从而可以计算其梯度;to()作用就是将其载入设备。 target_labels = self.get_target_label(images, labels):若是有目标攻击的情况,获取目标标签。目标标签的选取有多种方式,例如可以选择与真实标签相差最大的标签,也可以随机选择除真实标签外的标签。 loss = nn.CrossEntropyLoss():设置损失函数为交叉熵损失。

adv_images = adv_images + torch.empty_like(adv_images).uniform_(-self.eps, self.eps)

adv_images = torch.clamp(adv_images, min=0, max=1).detach()

以上两行代码作用即为添加随机扰动,torch.empty_like(adv_images)会返回一个形状同adv_images的空的Tensor,uniform_(-self.eps, self.eps)将Tensor中的值在

[

ϵ

,

ϵ

]

[-\epsilon,\epsilon]

[−ϵ,ϵ]范围内的均匀分布中随机取值。torch.clamp(adv_images, min=0, max=1)会将图像中大于1的值设为1、小于0的值设为0,防止超出范围。 adv_images.requires_grad = True:将requires_grad 参数设置为True,torch就会在图像的计算过程中自动计算计算图,用于反向梯度计算。 outputs = self.get_logits(images):获得图像的在模型中的输出值。 cost = -loss(outputs, target_labels):有目标情况下计算损失。 cost = loss(outputs, labels):无目标情况下计算损失。 grad = torch.autograd.grad(cost, images, retain_graph=False, create_graph=False)[0]:cost对images求导,得到梯度grad。 adv_images = images + self.alpha*grad.sign():根据公式在图像上沿着梯度上升方向以步长为

α

\alpha

α增加扰动。

delta = torch.clamp(adv_images - images, min=-self.eps, max=self.eps) # 得到改变量

adv_images = torch.clamp(images + delta, min=0, max=1).detach() # 防止图像超出有效范围

以上两行代码就是裁剪的过程,同BIM算法中的

C

l

i

p

Clip

Clip过程,防止图像超出

[

0

,

1

]

[0,1]

[0,1]范围。

PGDL2源码

import torch

import torch.nn as nn

from ..attack import Attack

class PGDL2(Attack):

r"""

PGD in the paper 'Towards Deep Learning Models Resistant to Adversarial Attacks'

[https://arxiv.org/abs/1706.06083]

Distance Measure : L2

Arguments:

model (nn.Module): model to attack.

eps (float): maximum perturbation. (Default: 1.0)

alpha (float): step size. (Default: 0.2)

steps (int): number of steps. (Default: 10)

random_start (bool): using random initialization of delta. (Default: True)

Shape:

- images: :math:`(N, C, H, W)` where `N = number of batches`, `C = number of channels`, `H = height` and `W = width`. It must have a range [0, 1].

- labels: :math:`(N)` where each value :math:`y_i` is :math:`0 \leq y_i \leq` `number of labels`.

- output: :math:`(N, C, H, W)`.

Examples::

>>> attack = torchattacks.PGDL2(model, eps=1.0, alpha=0.2, steps=10, random_start=True)

>>> adv_images = attack(images, labels)

"""

def __init__(self, model, eps=1.0, alpha=0.2, steps=10,

random_start=True, eps_for_division=1e-10):

super().__init__("PGDL2", model)

self.eps = eps

self.alpha = alpha

self.steps = steps

self.random_start = random_start

self.eps_for_division = eps_for_division

self.supported_mode = ['default', 'targeted']

def forward(self, images, labels):

r"""

Overridden.

"""

self._check_inputs(images)

images = images.clone().detach().to(self.device)

labels = labels.clone().detach().to(self.device)

if self.targeted:

target_labels = self.get_target_label(images, labels)

loss = nn.CrossEntropyLoss()

adv_images = images.clone().detach()

batch_size = len(images)

if self.random_start:

# Starting at a uniformly random point

delta = torch.empty_like(adv_images).normal_()

d_flat = delta.view(adv_images.size(0), -1) # 将图片矩阵展平,方便计算范数

n = d_flat.norm(p=2, dim=1).view(adv_images.size(0), 1, 1, 1) # 计算每个向量的模长

r = torch.zeros_like(n).uniform_(0, 1) # 随机[0,1]之间均匀分布

delta *= r/n*self.eps # 即将delta向量变为模长为[0,eps]之间的向量

adv_images = torch.clamp(adv_images + delta, min=0, max=1).detach()

for _ in range(self.steps):

adv_images.requires_grad = True

outputs = self.get_logits(adv_images)

# Calculate loss

if self.targeted:

cost = -loss(outputs, target_labels)

else:

cost = loss(outputs, labels)

# Update adversarial images

grad = torch.autograd.grad(cost, adv_images,

retain_graph=False, create_graph=False)[0]

grad_norms = torch.norm(grad.view(batch_size, -1), p=2, dim=1) + self.eps_for_division # 这边加上了self.eps_for_division是为了防止下面除0

grad = grad / grad_norms.view(batch_size, 1, 1, 1) # 使梯度变为单位向量

adv_images = adv_images.detach() + self.alpha * grad

# 下面是为了改变后的图像与原图像的L2距离不超过eps

delta = adv_images - images

delta_norms = torch.norm(delta.view(batch_size, -1), p=2, dim=1) # 计算改变量的模长

factor = self.eps / delta_norms

# 如果eps/delta_norms小于1,则说明改变量的L2距离超过了eps

# 那么就会在factor与delta相乘的过程中被替换为eps

factor = torch.min(factor, torch.ones_like(delta_norms))

delta = delta * factor.view(-1, 1, 1, 1)

adv_images = torch.clamp(images + delta, min=0, max=1).detach()

return adv_images

解析

PGDL2和PGDLinf的区别就在于度量样本之间的距离的范式不同,假设样本

X

=

(

x

1

,

x

2

,

x

3

,

.

.

.

,

x

n

)

X=(x_1,x_2,x_3,...,x_n)

X=(x1​,x2​,x3​,...,xn​),L2范数

X

2

=

x

1

2

+

x

2

2

+

x

3

2

+

.

.

.

+

x

n

2

||X||_2=\sqrt{x^2_1+x^2_2+x^2_3+...+x^2_n}

∣∣X∣∣2​=x12​+x22​+x32​+...+xn2​

​,Linf范数

X

=

x

1

n

+

x

2

n

+

x

3

n

+

.

.

.

+

x

n

n

n

||X||_\infty=\sqrt[n]{x^n_1+x^n_2+x^n_3+...+x^n_n}

∣∣X∣∣∞​=nx1n​+x2n​+x3n​+...+xnn​

​,简单来说,L2范数可以理解为向量的模长,Linf范数可以理解为向量中最大元素的值。

二者在源码中的区别可以看我写在代码中的注释。