【DropBlock】《DropBlock：A regularization method for convolutional networks》

news2025/7/14 10:40:26

在这里插入图片描述

NIPS-2018

文章目录

1 Background and Motivation
2 Related Work
3 Advantages / Contributions
4 DropBlock
5 Experiments
- 5.1 ImageNet Classification
- - 5.1.1 DropBlock in ResNet-50
  - 5.1.2 DropBlock in AmoebaNet
- 5.2 Experimental Analysis
- 5.3 Object Detection in COCO
- 5.4 Semantic Segmentation in PASCAL VOC
6 Conclusion（own）

1 Background and Motivation

Dropout 的缺点，as a regularization technique for fully connected layers, it is often less effective for convolutional layers

有此缺点的原因，activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout.

Thus a structured form of dropout is needed to regularize convolutional networks

作者提出了 DropBlock，a form of structured dropout, where units in a contiguous region of a feature map are dropped together

在这里插入图片描述
the networks must look elsewhere for evidence to fit the data

2 Related Work

DropConnect
maxout
StochasticDepth
DropPath
Scheduled-DropPath
shake-shake regularization
ShakeDrop regularization

The basic principle behind these methods is to inject noise into neural networks so that they do not overfit the training data.

Our method is inspired by Cutout（灵感来源去 cutout，可参考【Cutout】《Improved Regularization of Convolutional Neural Networks with Cutout》）

DropBlock generalizes Cutout by applying Cutout at every feature map in a convolutional networks.

3 Advantages / Contributions

提出DropBlock 数据增广策略，works better than dropout in regularizing convolutional networks

4 DropBlock

linearly increase it over time during training

Its main difference from dropout is that it drops
contiguous regions from a feature map of a layer instead of dropping out independent random units.

算法流程
在这里插入图片描述
示意图

先找 mask $M$ ，图2a 绿色区域， $M$ 中找 block 中心点，也即 zero entry $M_{i,j}$ （红X），服从 $M_{i,j} \sim Bernoulli(\gamma)$ ，以 block 中心外扩形成边长为 block_size 的正方形 block 区域（黑X），黑X 和绿框重叠的区域被置为了0

两个要配置的参数

block_size，所有特征图上 block_size 大小固定
DropBlock resembles dropout when block_size = 1 and resembles SpatialDropout when block_size covers the full feature map.（通道被 mask 了）
$\gamma$ ，

其中 keep_prob 的含义， keep every activation unit with the probability of keep_prob，实验中被设置为了 between 0.75 and 0.95

Scheduled DropBlock

gradually decreasing keep_prob over time from 1 to the target value is more robust

实验中用的是线性 decrease（use a linear scheme of decreasing the value of keep_prob）

5 Experiments

Datasets

ILSVRC 2012 classification dataset
COCO
PASCAL VOC

5.1 ImageNet Classification

5.1.1 DropBlock in ResNet-50

在这里插入图片描述
1）Where to apply DropBlock

only after convolution layers or applying DropBlock after both convolution layers and skip connections.

applying DropBlock to Group 4 or to both Groups 3 and 4（对应的应该是 ResNet 的 stage4 和 stage5）

2）DropBlock vs. dropout
在这里插入图片描述

block_size 默认为 7

图 3a 可以看出，DropBlock 效果比较好

图 3b 可以看出，引入 scheduled keep_prob 后，acc 更高，而且应对不同的 keep_prob 设定，其鲁棒性更好（峰值维持的更持久）

在这里插入图片描述
图 4 可以看出，DropBlock 作用在 Group3&4 比单独作用在 Group3 上效果要好，引入 schedule keep_prob 后效果提升，DropBlock 作用在 skip connection 分支后，效果进一步提升。block_size 设置为7效果最好。