一、代码作用（rpn_function.py）

二、代码解析

2.1 RegionProposalNetwork类

2.1.1 正向传播过程forward 接着上篇博客的2.1.2节

2.1.2 assign_targets_to_anchors

2.1.3 det_utils.Matcher传入参数

2.1.4 compute_loss

2.1.5 smooth_l1_loss

2.2 Matcher类（det_utils.py）

2.2.1 __call__

2.2.2 set_low_quality_matches_

2.3 BoxCoder类（det_utils.py）

2.3.1 encode

2.3.2 encode_single

2.3.3 encode_boxes

2.3 BalancedPositiveNegativeSampler类（det_utils.py）

2.3.1 __init__

2.3.2 正向传播过程__call__

一、代码作用（rpn_function.py）

本文解读RPN的损失计算部分，上篇博客中我们介绍了如何利用我们生成的anchors以及我们的预测回归参数得到我们的proposals，再通过一系列过滤算法滤除小概率小面积的目标得到过滤后的flitter proposal，本文我们将讲述如何利用我们标注的gtbox与我们生成的anchors进行匹配选取正负样本，利用正负样本再计算我们的RPN Loss。

二、代码解析

2.1 RegionProposalNetwork类

2.1.1 正向传播过程forward 接着上篇博客的2.1.2节

	#features是预测特征层的特征矩阵
    def forward(self,
                images,        # type: ImageList
                features,      # type: Dict[str, Tensor]
                targets=None   # type: Optional[List[Dict[str, Tensor]]]
                ):
        # type: (...) -> Tuple[List[Tensor], Dict[str, Tensor]]
        """
        Arguments:
            images (ImageList): images for which we want to compute the predictions
            features (Dict[Tensor]): features computed from the images that are
                used for computing the predictions. Each tensor in the list
                correspond to different feature levels
            targets (List[Dict[Tensor]): ground-truth boxes present in the image (optional).
                If provided, each element in the dict should contain a field `boxes`,
                with the locations of the ground-truth boxes.

        Returns:
            boxes (List[Tensor]): the predicted boxes from the RPN, one Tensor per
                image.
            losses (Dict[Tensor]): the losses for the model during training. During
                testing, it is an empty dict.
        """
        # RPN uses all feature maps that are available
        # features是所有预测特征层组成的OrderedDict
		#提取预测特征层的特征矩阵 ，是个字典类型，我们将key抽出去只留val
        features = list(features.values())

        # 计算每个预测特征层上的预测目标概率和bboxes regression参数
        # objectness和pred_bbox_deltas都是list
		#均以预测特征层进行划分  shape 8（batch） 15（每个预测特征层有多少个anchor  5scanle 3 ratio） 34（高） 42（宽度）  
		#shape 8 60（在每个cell中参数个数 每个anchor需要四个参数 15*4） 34 42
        objectness, pred_bbox_deltas = self.head(features)

        # 生成一个batch图像的所有anchors信息,list(tensor)元素个数等于batch_size
		#是一个列表，有8个，每一个对应的就是图片的anchor信息 21420*4
        anchors = self.anchor_generator(images, features)

        # batch_size = 8
        num_images = len(anchors)

        # numel() Returns the total number of elements in the input tensor.
        # 计算每个预测特征层上的对应的anchors数量
		# o[0].shape 15 34 42     相乘得到每个预测特征层上anchor的个数
        num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness]
        num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors]

        # 调整内部tensor格式以及shape
		#	#8 21420 1     8 21420 4
        objectness, pred_bbox_deltas = concat_box_prediction_layers(objectness,
                                                                    pred_bbox_deltas)
		#objectness  171360 1 
		#pred_bbox_deltas 171360 4

        # apply pred_bbox_deltas to anchors to obtain the decoded proposals
        # note that we detach the deltas because Faster R-CNN do not backprop through
        # the proposals
        # 将预测的bbox regression参数应用到anchors上得到最终预测bbox坐标
        proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)

		#155040*1*4
        proposals = proposals.view(num_images, -1, 4)
		#8 19380 4

        # 筛除小boxes框，nms处理，根据预测概率获取前post_nms_top_n个目标
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)

        losses = {}
		#如果是训练模式
        if self.training:
            assert targets is not None
            # 计算每个anchors最匹配的gt，并将anchors进行分类，前景，背景以及废弃的anchors
			#labels					tensor261888 0 0 0 0 1 0.....0
			#matched_gt_boxes		每张图片的anchors所对应的gtbox
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            # 结合anchors以及对应的gt，计算regression参数
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets
            )
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg
            }
        return boxes, losses
传入的参数image是ImageLIst类类型，features是传入的预测特征层的特征矩阵，targets是gtbox等信息。

首先将预测特征层的特征矩阵全部提取出来，因为features是一个字典类型将val提取出来。转化成一个list放进features里。

在执行这行代码之前我们可以看到feature是一个有序的字典类型，只有一个变量，因为在mobilenet中只有一个预测特征层。

元素的shape是 $(8\times1280\times42\times38)$ ，8代表batchsize，1280代表预测特征层的channel， $42 \times 38$ 代表着预测特征矩阵的宽和高。

在train_res50_fpn我们再调试一下：有五个预测特征层。每个预测特征层的信息也可以查询。

我们再执行完这行代码之后我们过滤掉了key值，只要val：

随后将预测特征层输入到我们的head类中，即RPNHead类中得到目标概率分数以及边界框回归参数。
        # 计算每个预测特征层上的预测目标概率和bboxes regression参数
        # objectness和pred_bbox_deltas都是list
		#均以预测特征层进行划分  shape 8（batch） 15（每个预测特征层有多少个anchor  5scanle 3 ratio） 34（高） 42（宽度）  
		#shape 8 60（在每个cell中参数个数 每个anchor需要四个参数 15*4） 34 42
        objectness, pred_bbox_deltas = self.head(features)
我们还是以mobilenet脚本为例，因为我们只有一个预测特征层，看一下objectness的shape：

8代表batch的大小，15对应在每个cell预测多少个anchor，高度和宽度。

因为一个anchor对应四个坐标，因此它的第二维度为60。

然后得到anchors，anchors = self.anchor_generator(images, features)。

这里它是个列表，列表的每一个元素对应的每一个图片的anchor信息。[21420,4]

计算每个预测特征层上的对应的anchors数量。
        # numel() Returns the total number of elements in the input tensor.
        # 计算每个预测特征层上的对应的anchors数量
		# o[0].shape 15 34 42     相乘得到每个预测特征层上anchor的个数
        num_anchors_per_level_shape_tensors = [o[0].shape for o in objectness]
        num_anchors_per_level = [s[0] * s[1] * s[2] for s in num_anchors_per_level_shape_tensors]
        objectness是对应预测特征层的个数，由于我们在mobilenet中只有一个，因此o[0].shape等于[15,34,42]，将其×起来得到每个预测特征层中的anchors的个数。

我们将objectness及pred_bbox_deltas输入到concat_box_prediction_layers函数中调整tensor的信息。

最终得到的信息是

        objectness ：171360 * 1

        pred_bbox_deltas ：171360 * 4

存储的是预测前景背景信息及目标边界框预测信息。

        将预测的bbox regression参数应用到anchors上得到最终预测bbox坐标。
proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
我们得到的proposal的坐标的shape是 $171360 \times 1 \times4$ 的。

我们再对proposal进行如下处理：
proposals = proposals.view(num_images, -1, 4)
我们得到的proposal的坐标的shape是 $8\times 21420 \times4$ 的。

再利用filter_proposals方法对proposal进行过滤。（2.1.5节）
        # 筛除小boxes框，nms处理，根据预测概率获取前post_nms_top_n个目标
        boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
最终得到过滤后的final_boxes, final_scores。

如果是训练模式求解损失：
        losses = {}
		#如果是训练模式
        if self.training:
            assert targets is not None
            # 计算每个anchors最匹配的gt，并将anchors进行分类，前景，背景以及废弃的anchors
			#labels					tensor261888 0 0 0 0 1 0.....0
			#matched_gt_boxes		每张图片的anchors所对应的gtbox
            labels, matched_gt_boxes = self.assign_targets_to_anchors(anchors, targets)
            # 结合anchors以及对应的gt，计算regression参数
            regression_targets = self.box_coder.encode(matched_gt_boxes, anchors)
            loss_objectness, loss_rpn_box_reg = self.compute_loss(
                objectness, pred_bbox_deltas, labels, regression_targets
            )
            losses = {
                "loss_objectness": loss_objectness,
                "loss_rpn_box_reg": loss_rpn_box_reg
            }
将anchors与targets通过assign_targets_to_anchors方法去匹配每个anchors对应的标签以及它所对应的gtbox（2.1.2节）

得到的labels 和 matched_gt_boxes对应每张图片是正样本、负样本、丢弃样本的模板和每一个anchor所匹配的gtbox。

我们再结合anchors以及对应的gt，计算regression参数。（2.3.1节）

我们得到的regression_targets参数如上，对应的每张图片的gtbox相对于anchors的回归参数。

我们再将：

@objectness：预测的目标分数（通过ROIHead层的 anchor*1大小的tensor）

@pred_bbox_deltas：预测的坐标偏移量（通过ROIHead层的 anchor*4大小的tensor）

@labels：真实的label信息对应每张图片是正样本1、负样本0、丢弃样本-1的模板。

@regression_targets：每个anchors对应的gtbox相对于anchors的回归参数。

送入compute_loss函数计算损失。得到框体损失与目标损失loss_objectness, loss_rpn_box_reg。

2.1.2 assign_targets_to_anchors

    def assign_targets_to_anchors(self, anchors, targets):
        # type: (List[Tensor], List[Dict[str, Tensor]]) -> Tuple[List[Tensor], List[Tensor]]
        """
        计算每个anchors最匹配的gt，并划分为正样本，背景以及废弃的样本
        Args：
            anchors: (List[Tensor])
            targets: (List[Dict[Tensor])
        Returns:
            labels: 标记anchors归属类别（1, 0, -1分别对应正样本，背景，废弃的样本）
                    注意，在RPN中只有前景和背景，所有正样本的类别都是1，0代表背景
            matched_gt_boxes：与anchors匹配的gt
        """
        labels = []
        matched_gt_boxes = []
        # 遍历每张图像的anchors和targets
		#targets_per_image中包含很多信息 boxes labels imageid area iscloud
        for anchors_per_image, targets_per_image in zip(anchors, targets):
            gt_boxes = targets_per_image["boxes"]
			
			#获取gt_boxes所有元素的个数
            if gt_boxes.numel() == 0:
                device = anchors_per_image.device
                matched_gt_boxes_per_image = torch.zeros(anchors_per_image.shape, dtype=torch.float32, device=device)
                labels_per_image = torch.zeros((anchors_per_image.shape[0],), dtype=torch.float32, device=device)
            else:
                # 计算anchors与真实bbox的iou信息
                # set to self.box_similarity when https://github.com/pytorch/pytorch/issues/27495 lands

				#计算gt_boxes和人工标注的anchors的iou得到矩阵
				#【3 261888】 该张图片中gt_boxes的个数是3 该图像中生成的anchor个数为261888，这个矩阵代表3和261888的交并比矩阵
                match_quality_matrix = box_ops.box_iou(gt_boxes, anchors_per_image)
               
			    # 计算每个anchors与gt匹配iou最大的索引（如果iou<0.3索引置为-1，0.3<iou<0.7索引为-2）
			    #通过我们刚才计算的iou值来为每个anchors分配她所匹配的gt_boxes
				#shape为261888 数值为-1 -2 1
                matched_idxs = self.proposal_matcher(match_quality_matrix)



                # 这里使用clamp设置下限0是为了方便取每个anchors对应的gt_boxes信息
                # 负样本和舍弃的样本都是负值，所以为了防止越界直接置为0
                # 因为后面是通过labels_per_image变量来记录正样本位置的，
                # 所以负样本和舍弃的样本对应的gt_boxes信息并没有什么意义，
                # 反正计算目标边界框回归损失时只会用到正样本。
                matched_gt_boxes_per_image = gt_boxes[matched_idxs.clamp(min=0)]

                # 记录所有anchors匹配后的标签(正样本处标记为1，负样本处标记为0，丢弃样本处标记为-2)
				#大小为2618888 为true和false组成的tensor
                labels_per_image = matched_idxs >= 0
                labels_per_image = labels_per_image.to(dtype=torch.float32)

                # 背景 (negative examples)
                bg_indices = matched_idxs == self.proposal_matcher.BELOW_LOW_THRESHOLD  # -1
				#将正样本模板对应的负样本部分填充为0.0
                labels_per_image[bg_indices] = 0.0

                # discard indices that are between thresholds
                inds_to_discard = matched_idxs == self.proposal_matcher.BETWEEN_THRESHOLDS  # -2
                labels_per_image[inds_to_discard] = -1.0

            labels.append(labels_per_image)
            matched_gt_boxes.append(matched_gt_boxes_per_image)
        return labels, matched_gt_boxes
遍历每张图片对应的anchors和targets。

对于每张图片，我们仅将图片的boxes的信息提取出来放入gt_boxes变量中。

我们设置断点发现targets_per_image里面有很多信息。这里我们只关注boxes信息。

判断gt_boxes变量是否有元素（一般不会没有），我们计算gt_boxes和anchors_per_image（每张图片的anchor信息）的IoU值。

交并比计算方法

我们得到了矩阵match_quality_matrix ，它的shape： $[3\times261888]$

$6*4$

3代表在该张图片中gtbox的个数是3，261888代表该张图片生成的anchor个数为261888。

为了方便理解，我们拿一个简单的表做举例：

假设当前图片中有四个gtbox，每个gtbox有四个坐标，因此GTs是 $4*4$ 。假设我们图片中有6个anchors，每个anchors也有6个坐标，因此shape是 $6*4$ ，当然，IoU的matrix就是4*6的了。对应每个gtbox与每个Anchor的IoU值。

我们再通过proposal_matcher方法对我们计算出的IoU值进行处理。
			    # 计算每个anchors与gt匹配iou最大的索引（如果iou<0.3索引置为-1，0.3<iou<0.7索引为-2）
			    #通过我们刚才计算的iou值来为每个anchors分配她所匹配的gt_boxes
				#shape为261888 数值为-1 -2 1
                matched_idxs = self.proposal_matcher(match_quality_matrix)
这里返回的matched_idxs = [2,1,-2,0,-1,3]，对于程序来说，它的shape和anchors的shape（261888）是一样的，记录了每个anchor和哪个gtbox进行了匹配。比如matches_idx[0]=2，代表着第一个anchor与第3个gtbox是匹配到的。

将matched_idxs用clamp方法设置0为下限并将其作为索引传递给gt_boxes。得到每个anchors所匹配到的gt_box的坐标matched_gt_boxes_per_image。这里我们用clamp方法设置0为下限之后所有为-1，-2地方的数据全部置为0了即matched_idxs = [2,1,0,0,0,3]，那么matched_gt_boxes_per_image就是存储着第3，2，1，1，1，2的gtbox框体信息。（这里不会有影响的，因为我们在计算边界框损失只会计算正样本的，负样本自动被丢弃掉）。
                # 记录所有anchors匹配后的标签(正样本处标记为1，负样本处标记为0，丢弃样本处标记为-2)
				#大小为2618888 为true和false组成的tensor
                labels_per_image = matched_idxs >= 0
                labels_per_image = labels_per_image.to(dtype=torch.float32)
再将matched_idxs 大于 0 的部分索引找到得到labels_per_image ，它的大小为261888，为true和false，true所对应的位置即为匹配到gtbox的位置，在这个例子中：

        matched_idxs = [2,1,-2,0,-1,3]

        matched_gt_boxes_per_image= [2,1,0,0,0,3]

        labels_per_image = [true,true,false,false,false,true]

        labels_per_image = [1,1,false,1,false,1]
                # 背景 (negative examples)
                bg_indices = matched_idxs == self.proposal_matcher.BELOW_LOW_THRESHOLD  # -1
				#将正样本模板对应的负样本部分填充为0.0
                labels_per_image[bg_indices] = 0.0
再为负样本创建一个蒙版：

        bg_indices = [false,false,false,false,true,false]

        将正样本模板对应的负样本部分填充为0.0：

        labels_per_image = [1,1,false,1,0,1]

我们再得到一个废弃的蒙版：

        inds_to_discard = [false,false,true,false,false,false]

        将正样本模板对应的废弃部分填充为-1.0：

        labels_per_image = [1,1,-1,1,0,1]

最终得到的labels_per_image 正样本的信息为1，负样本的信息为0，丢弃信息为-1。

将该张图片的labels_per_image 添加到 labels （一个batch多张图片）中。

将该张图片的matched_gt_boxes_per_image添加到matched_gt_boxes中。

向上层函数返回labels, matched_gt_boxes。

2.1.3 det_utils.Matcher传入参数

        self.proposal_matcher = det_utils.Matcher(
            fg_iou_thresh,  # 当iou大于fg_iou_thresh(0.7)时视为正样本
            bg_iou_thresh,  # 当iou小于bg_iou_thresh(0.3)时视为负样本
            allow_low_quality_matches=True
        )

2.1.4 compute_loss

    def compute_loss(self, objectness, pred_bbox_deltas, labels, regression_targets):
        # type: (Tensor, Tensor, List[Tensor], List[Tensor]) -> Tuple[Tensor, Tensor]
        """
        计算RPN损失，包括类别损失（前景与背景），bbox regression损失
        Arguments:
            objectness (Tensor)：预测的前景概率
            pred_bbox_deltas (Tensor)：预测的bbox regression
            labels (List[Tensor])：真实的标签 1, 0, -1（batch中每一张图片的labels对应List的一个元素中）
            regression_targets (List[Tensor])：真实的bbox regression

        Returns:
            objectness_loss (Tensor) : 类别损失
            box_loss (Tensor)：边界框回归损失
        """
        # 按照给定的batch_size_per_image, positive_fraction选择正负样本
        sampled_pos_inds, sampled_neg_inds = self.fg_bg_sampler(labels)
        # 将一个batch中的所有正负样本List(Tensor)分别拼接在一起，并获取非零位置的索引
        # sampled_pos_inds = torch.nonzero(torch.cat(sampled_pos_inds, dim=0)).squeeze(1)
        sampled_pos_inds = torch.where(torch.cat(sampled_pos_inds, dim=0))[0]
        # sampled_neg_inds = torch.nonzero(torch.cat(sampled_neg_inds, dim=0)).squeeze(1)
        sampled_neg_inds = torch.where(torch.cat(sampled_neg_inds, dim=0))[0]

        # 将所有正负样本索引拼接在一起
        sampled_inds = torch.cat([sampled_pos_inds, sampled_neg_inds], dim=0)
        objectness = objectness.flatten()

        labels = torch.cat(labels, dim=0)
        regression_targets = torch.cat(regression_targets, dim=0)

        # 计算边界框回归损失
        box_loss = det_utils.smooth_l1_loss(
            pred_bbox_deltas[sampled_pos_inds],
            regression_targets[sampled_pos_inds],
            beta=1 / 9,
            size_average=False,
        ) / (sampled_inds.numel())

        # 计算目标预测概率损失
        objectness_loss = F.binary_cross_entropy_with_logits(
            objectness[sampled_inds], labels[sampled_inds]
        )

        return objectness_loss, box_loss
传入参数：

@objectness：预测的目标分数（通过ROIHead层的 anchor*1大小的tensor）

@pred_bbox_deltas：预测的坐标偏移量（通过ROIHead层的 anchor*4大小的tensor）

@labels：真实的label信息对应每张图片是正样本1、负样本0、丢弃样本-1的模板。

@regression_targets：每个anchors对应的gtbox相对于anchors的回归参数。

将labels传递给fg_bg_sampler方法选取计算损失时的正负样本。
        self.fg_bg_sampler = det_utils.BalancedPositiveNegativeSampler(
            batch_size_per_image, positive_fraction  # 256, 0.5
        )
在初始化函数中我们发现它就是一个BalancedPositiveNegativeSampler类，传入的参数为计算损失时的正负样本总个数batch_size_per_image以及正样本所占的比率positive_fraction。（2.3节）

这里sampled_pos_inds, sampled_neg_inds是两个蒙版，大小与labels相同，其中sampled_pos_inds中有抽样256个中的正样本索引，sampled_neg_inds存储着256个抽样中负样本的索引，在抽样的位置上索引值为1其他位置索引值为0。

我们对蒙版进行拼接：
        # 将一个batch中的所有正负样本List(Tensor)分别拼接在一起，并获取非零位置的索引
        # sampled_pos_inds = torch.nonzero(torch.cat(sampled_pos_inds, dim=0)).squeeze(1)
        sampled_pos_inds = torch.where(torch.cat(sampled_pos_inds, dim=0))[0]
        # sampled_neg_inds = torch.nonzero(torch.cat(sampled_neg_inds, dim=0)).squeeze(1)
        sampled_neg_inds = torch.where(torch.cat(sampled_neg_inds, dim=0))[0]
我们可以结合下面的调试过程看出，第一张图片有7个正样本，第二个图像有6个正样本，其中的信息为labels的idx信息。
        # 将所有正负样本索引拼接在一起
        sampled_inds = torch.cat([sampled_pos_inds, sampled_neg_inds], dim=0)
将正负样本进行拼接，我们得到了计算损失的所有样本（256*2）。
objectness = objectness.flatten()
再将预测目标的分数objectness进行展平处理。

展平之前的shape为 $523776 \times 1$ ，展平之后变成了 $523776$ 。

将labels进行拼接：
labels = torch.cat(labels, dim=0)
原来的labels的shape为：list类型，一个list有两个元素，每个元素有261868个元素。

展平之后变成了 $523776$ 。

再将lables进行拼接，labels是两个 $261888$ 大小的tensor。

拼接完之后变成了 $523776$ 。

同理，将传入的regression_targets列表也进行拼接，拼接后的结果是：

有了这些处理后的信息后，我们进行边界框的损失计算：（只需要计算正样本）
        # 计算边界框回归损失
        box_loss = det_utils.smooth_l1_loss(
            pred_bbox_deltas[sampled_pos_inds],
            regression_targets[sampled_pos_inds],
            beta=1 / 9,
            size_average=False,
        ) / (sampled_inds.numel())
传入的参数是：

        pred_bbox_deltas[sampled_pos_inds]：通过卷积层预测的bbox regression，sampled_pos_inds是正样本对应的索引值。
        regression_targets[sampled_pos_inds]：真实的bbox regression。通过anchors和bbox进行回归预测的。
        beta=1 / 9：
        size_average=False

最后计算目标概率损失，同样，我在论文解读的时候推到过，这里不加阐述：

   我在论文解读的时候推导过，这里不加阐述。

Faster RCNN网络源码解读（Ⅰ） --- Fast RCNN、Faster RCNN论文解读https://blog.csdn.net/qq_41694024/article/details/128483662

2.1.5 smooth_l1_loss

def smooth_l1_loss(input, target, beta: float = 1. / 9, size_average: bool = True):
    """
    very similar to the smooth_l1_loss from pytorch, but with
    the extra beta parameter
    """
    n = torch.abs(input - target)
    # cond = n < beta
    cond = torch.lt(n, beta)
    loss = torch.where(cond, 0.5 * n ** 2 / beta, n - 0.5 * beta)
    if size_average:
        return loss.mean()
    return loss.sum()
我在论文解读的时候推导过，这里不加阐述。

Faster RCNN网络源码解读（Ⅰ） --- Fast RCNN、Faster RCNN论文解读https://blog.csdn.net/qq_41694024/article/details/128483662

2.2 Matcher类（det_utils.py）

2.2.1 call

class Matcher(object):
    BELOW_LOW_THRESHOLD = -1
    BETWEEN_THRESHOLDS = -2

    __annotations__ = {
        'BELOW_LOW_THRESHOLD': int,
        'BETWEEN_THRESHOLDS': int,
    }

    def __init__(self, high_threshold, low_threshold, allow_low_quality_matches=False):
        # type: (float, float, bool) -> None
        """
        Args:
            high_threshold (float): quality values greater than or equal to
                this value are candidate matches.
            low_threshold (float): a lower quality threshold used to stratify
                matches into three levels:
                1) matches >= high_threshold
                2) BETWEEN_THRESHOLDS matches in [low_threshold, high_threshold)
                3) BELOW_LOW_THRESHOLD matches in [0, low_threshold)
            allow_low_quality_matches (bool): if True, produce additional matches
                for predictions that have only low-quality match candidates. See
                set_low_quality_matches_ for more details.
        """
        self.BELOW_LOW_THRESHOLD = -1
        self.BETWEEN_THRESHOLDS = -2
        assert low_threshold <= high_threshold
        self.high_threshold = high_threshold  # 0.7
        self.low_threshold = low_threshold    # 0.3
        self.allow_low_quality_matches = allow_low_quality_matches

	#match_quality_matrix为我们的iou矩阵
    def __call__(self, match_quality_matrix):
        """
        计算anchors与每个gtboxes匹配的iou最大值，并记录索引，
        iou<low_threshold索引值为-1， low_threshold<=iou<high_threshold索引值为-2
        Args:
            match_quality_matrix (Tensor[float]): an MxN tensor, containing the
            pairwise quality between M ground-truth elements and N predicted elements.

        Returns:
            matches (Tensor[int64]): an N tensor where N[i] is a matched gt in
            [0, M - 1] or a negative value indicating that prediction i could not
            be matched.
        """
        if match_quality_matrix.numel() == 0:
            # empty targets or proposals not supported during training
            if match_quality_matrix.shape[0] == 0:
                raise ValueError(
                    "No ground-truth boxes available for one of the images "
                    "during training")
            else:
                raise ValueError(
                    "No proposal boxes available for one of the images "
                    "during training")

        # match_quality_matrix is M (gt) x N (predicted)
        # Max over gt elements (dim 0) to find best gt candidate for each prediction
        # M x N 的每一列代表一个anchors与所有gt的匹配iou值
        # matched_vals代表每列的最大值，即每个anchors与所有gt匹配的最大iou值
        # matches对应最大值所在的索引
		#即对每个anchor求与标注框的最大值 
        matched_vals, matches = match_quality_matrix.max(dim=0)  # the dimension to reduce.
        if self.allow_low_quality_matches:
            all_matches = matches.clone()
        else:
            all_matches = None

        # Assign candidate matches with low quality to negative (unassigned) values
        # 计算iou小于low_threshold的索引
        below_low_threshold = matched_vals < self.low_threshold
        # 计算iou在low_threshold与high_threshold之间的索引值
        between_thresholds = (matched_vals >= self.low_threshold) & (
            matched_vals < self.high_threshold
        )
        # iou小于low_threshold的matches索引置为-1
        matches[below_low_threshold] = self.BELOW_LOW_THRESHOLD  # -1

        # iou在[low_threshold, high_threshold]之间的matches索引置为-2
        matches[between_thresholds] = self.BETWEEN_THRESHOLDS    # -2

		#启动规则：将匹配度最大的也设置为正样本
        if self.allow_low_quality_matches:
            assert all_matches is not None
            self.set_low_quality_matches_(matches, all_matches, match_quality_matrix)

        return matches

    def set_low_quality_matches_(self, matches, all_matches, match_quality_matrix):
        """
        Produce additional matches for predictions that have only low-quality matches.
        Specifically, for each ground-truth find the set of predictions that have
        maximum overlap with it (including ties); for each prediction in that set, if
        it is unmatched, then match it to the ground-truth with which it has the highest
        quality value.
        """
        # For each gt, find the prediction with which it has highest quality
        # 对于每个gt boxes寻找与其iou最大的anchor，
        # highest_quality_foreach_gt为匹配到的最大iou值
        highest_quality_foreach_gt, _ = match_quality_matrix.max(dim=1)  # the dimension to reduce.

        # Find highest quality match available, even if it is low, including ties
        # 寻找每个gt boxes与其iou最大的anchor索引，一个gt匹配到的最大iou可能有多个anchor
        # gt_pred_pairs_of_highest_quality = torch.nonzero(
        #     match_quality_matrix == highest_quality_foreach_gt[:, None]
        # )
        gt_pred_pairs_of_highest_quality = torch.where(
            torch.eq(match_quality_matrix, highest_quality_foreach_gt[:, None])
        )
        # Example gt_pred_pairs_of_highest_quality:
        #   tensor([[    0, 39796],
        #           [    1, 32055],
        #           [    1, 32070],
        #           [    2, 39190],
        #           [    2, 40255],
        #           [    3, 40390],
        #           [    3, 41455],
        #           [    4, 45470],
        #           [    5, 45325],
        #           [    5, 46390]])
        # Each row is a (gt index, prediction index)
        # Note how gt items 1, 2, 3, and 5 each have two ties

        # gt_pred_pairs_of_highest_quality[:, 0]代表是对应的gt index(不需要)
        # pre_inds_to_update = gt_pred_pairs_of_highest_quality[:, 1]
        pre_inds_to_update = gt_pred_pairs_of_highest_quality[1]
        # 保留该anchor匹配gt最大iou的索引，即使iou低于设定的阈值
        matches[pre_inds_to_update] = all_matches[pre_inds_to_update]

初始化函数中我们进行简单的赋值。

我们看看它的正向传播__call__函数：传入的参数为IoU矩阵

首先判断match_quality_matrix的元素个数是否为0。

if match_quality_matrix.numel() == 0:

为0的话抛出异常。

我们对match_quality_matrix这个矩阵在维度为0的方向求最大值，返回的第一个元素matched_vals是对应的最大的数值，返回的第二个元素matches是最大值所对应的索引。

        # match_quality_matrix is M (gt) x N (predicted)
        # Max over gt elements (dim 0) to find best gt candidate for each prediction
        # M x N 的每一列代表一个anchors与所有gt的匹配iou值
        # matched_vals代表每列的最大值，即每个anchors与所有gt匹配的最大iou值
        # matches对应最大值所在的索引
		#即对每个anchor求与标注框的最大值 
        matched_vals, matches = match_quality_matrix.max(dim=0)  # the dimension to reduce.

维度为0的方向求最大值，即 $(4\times6)$ 中的4，即求每列的最大值，即每个anchor与我们的每个gtbox的最大IoU值。

因此在这里

matched_vals = [0.9,0.85,0.5,0.8,0.2,0.65]

matches = [2,1,0,0,0,3]

        if self.allow_low_quality_matches:
            all_matches = matches.clone()
        else:
            all_matches = None

我们在传入参数的时候allow_low_quality_matches为true。因此我们把matches克隆一份给all_matches变量，即：

all_matches = [2,1,0,0,0,3]

将matched_vals 与 low_threshold 进行对比，若是小于关系，我们得到一个蒙板，即若IoU小于0.3为true，不小于0.3为false。

我们可以看到，这个变量中存储的都是true/false。shape和matched_vals 都是一样的。

在找到IoU值大于0.3小于0.7的蒙版between_thresholds。

        # iou小于low_threshold的matches索引置为-1
        matches[below_low_threshold] = self.BELOW_LOW_THRESHOLD  # -1

在below_low_threshold = true的地方将matchers设置为 -1。（负样本）

iou在[low_threshold, high_threshold]之间的matches索引置为-2。（丢弃）

现在的matches为：

matches = [2,1,-2,0,-1,-2]

我们现在的gt0、gt1、gt2被匹配到了，gt3并没有被任何一个anchors匹配到，因为我们任何一个anchors的IoU值都是小于0.7的。

		#启动规则：将匹配度最大的也设置为正样本
        if self.allow_low_quality_matches:
            assert all_matches is not None
            self.set_low_quality_matches_(matches, all_matches, match_quality_matrix)

这里是是否启动一种匹配准则：即与gtbox匹配度最大的anchor也将其设置为正样本。即无论匹配到的最大值的IoU值为多少，都将与之匹配的anchors设置为正样本。

即也将我们的anchors5也设置为正样本。

我们将matches, all_matches, match_quality_matrix三个参数传给set_low_quality_matches_这个方法。

matches：与anchors匹配的gtbox [2,1,-2,0,-1,-2]，即第一个anchors与第三个gtbox匹配。

all_matches：没经过below_low_threshold、between_thresholds筛选初始的蒙版。

[2,1,0,0,0,3]

match_quality_matrix：IoU矩阵

经过这一方法之后，我们的gtbox3被匹配到，matches=[2,1,-2,0,-1,3]

我们返回matches。

2.2.2 set_low_quality_matches_

    def set_low_quality_matches_(self, matches, all_matches, match_quality_matrix):
        """
        Produce additional matches for predictions that have only low-quality matches.
        Specifically, for each ground-truth find the set of predictions that have
        maximum overlap with it (including ties); for each prediction in that set, if
        it is unmatched, then match it to the ground-truth with which it has the highest
        quality value.
        """
        # For each gt, find the prediction with which it has highest quality
        # 对于每个gt boxes寻找与其iou最大的anchor，
        # highest_quality_foreach_gt为匹配到的最大iou值
        highest_quality_foreach_gt, _ = match_quality_matrix.max(dim=1)  # the dimension to reduce.

        # Find highest quality match available, even if it is low, including ties
        # 寻找每个gt boxes与其iou最大的anchor索引，一个gt匹配到的最大iou可能有多个anchor
        # gt_pred_pairs_of_highest_quality = torch.nonzero(
        #     match_quality_matrix == highest_quality_foreach_gt[:, None]
        # )
        gt_pred_pairs_of_highest_quality = torch.where(
            torch.eq(match_quality_matrix, highest_quality_foreach_gt[:, None])
        )
        # Example gt_pred_pairs_of_highest_quality:
        #   tensor([[    0, 39796],
        #           [    1, 32055],
        #           [    1, 32070],
        #           [    2, 39190],
        #           [    2, 40255],
        #           [    3, 40390],
        #           [    3, 41455],
        #           [    4, 45470],
        #           [    5, 45325],
        #           [    5, 46390]])
        # Each row is a (gt index, prediction index)
        # Note how gt items 1, 2, 3, and 5 each have two ties

        # gt_pred_pairs_of_highest_quality[:, 0]代表是对应的gt index(不需要)
        # pre_inds_to_update = gt_pred_pairs_of_highest_quality[:, 1]
        pre_inds_to_update = gt_pred_pairs_of_highest_quality[1]
        # 保留该anchor匹配gt最大iou的索引，即使iou低于设定的阈值
        matches[pre_inds_to_update] = all_matches[pre_inds_to_update]

还是拿上面的做例子：

highest_quality_foreach_gt, _ = match_quality_matrix.max(dim=1)  # the dimension to reduce.

我们在维度为1的方向上取最大值。

highest_quality_foreach_gt = [0.8,0.85,0.9,0.65]

再将match_quality_matrix（IoU矩阵）和highest_quality_foreach_gt进行对比：

在数值相等的部分为true，不同的地方为false。

得到gt_pred_pairs_of_highest_quality = [ [0,3] , [1,1] , [2,0] , [3,5] ]

再将此变量取索引为1的部分赋值给pre_inds_to_update变量

pre_inds_to_update = [3,1,0,5]

        # 保留该anchor匹配gt最大iou的索引，即使iou低于设定的阈值
        matches[pre_inds_to_update] = all_matches[pre_inds_to_update]

matches是我们通过0.3和0.7的阈值寻找正负样本及抛弃样本的索引：

matches：与anchors匹配的gtbox [2,1,-2,0,-1,-2]，即第一个anchors与第三个gtbox匹配。

all_matches：没有进行筛选正负样本之前的matches。[2,1,0,0,0,3]

执行完这行代码之后，对于索引为3的地方all_matches[3] = 0，matches[3] = 0不动，对于索引为1、0的位置均不变，但对于索引为5的地方all_matches[5] = 3 ，因此matches[5] 改为3。这样我们的gt3就被匹配到了。

现在的matches：[2,1,-2,0,-1,3]

2.3 BoxCoder类（det_utils.py）

2.3.1 encode

	#matched_gt_boxes, anchors
    def encode(self, reference_boxes, proposals):
        # type: (List[Tensor], List[Tensor]) -> List[Tensor]
        """
        结合anchors和与之对应的gt计算regression参数
        Args:
            reference_boxes: List[Tensor] 每个proposal/anchor对应的gt_boxes
            proposals: List[Tensor] anchors/proposals

        Returns: regression parameters

        """
        # 统计每张图像的anchors个数，方便后面拼接在一起处理后在分开
        # reference_boxes和proposal数据结构相同

		#261888 2618888 261888 2618888 .....
        boxes_per_image = [len(b) for b in reference_boxes]
		#每张图片对应的anchors对应的gtbox坐标拼接到一起  523776 * 4
        reference_boxes = torch.cat(reference_boxes, dim=0)
		#将anchors也做一个拼接
        proposals = torch.cat(proposals, dim=0)

        # targets_dx, targets_dy, targets_dw, targets_dh
		#523776*4
        targets = self.encode_single(reference_boxes, proposals)
        return targets.split(boxes_per_image, 0)
我们传入的参数是matched_gt_boxes, anchors。

matched_gt_boxes是每个anchor对应匹配的gtbox信息。

anchors是生成的anchors信息。

我们遍历我们的matched_gt_boxes，记录每张图片的anchors的数量，用boxes_per_image变量保存。

我们看到每张图片的anchor个数都是261888。

将我们的reference_boxes即matched_gt_boxes进行一个拼接，得到reference_boxes。

它的大小为523776 * 4（261888*2 * 4）。

将我们的anchors也做一个拼接。

通过encode_single这个方法来进行计算。

这里的target是253776 * 4，对应着gtbox相对于anchors中心点坐标回归的targets信息。

最终返回的信息如下：

2.3.2 encode_single

	#encode_boxes
    def encode_single(self, reference_boxes, proposals):
        """
        Encode a set of proposals with respect to some
        reference boxes

        Arguments:
            reference_boxes (Tensor): reference boxes
            proposals (Tensor): boxes to be encoded
        """
        dtype = reference_boxes.dtype
        device = reference_boxes.device
        weights = torch.as_tensor(self.weights, dtype=dtype, device=device)
        targets = encode_boxes(reference_boxes, proposals, weights)

        return targets

在这个方法中，我们简单的求得我们变量的数据类型及设备。将权重参数也转化成一个tensor。再将reference_boxes, proposals, weights传递给我们的encode_boxes函数。

2.3.3 encode_boxes

def encode_boxes(reference_boxes, proposals, weights):
    # type: (torch.Tensor, torch.Tensor, torch.Tensor) -> torch.Tensor
    """
    Encode a set of proposals with respect to some
    reference boxes

    Arguments:
        reference_boxes (Tensor): reference boxes(gt)
        proposals (Tensor): boxes to be encoded(anchors)
        weights:
    """

    # 将weight进行一个分解
    wx = weights[0]
    wy = weights[1]
    ww = weights[2]
    wh = weights[3]

    # unsqueeze()
    # Returns a new tensor with a dimension of size one inserted at the specified position.
    proposals_x1 = proposals[:, 0].unsqueeze(1)
    proposals_y1 = proposals[:, 1].unsqueeze(1)
    proposals_x2 = proposals[:, 2].unsqueeze(1)
    proposals_y2 = proposals[:, 3].unsqueeze(1)

    reference_boxes_x1 = reference_boxes[:, 0].unsqueeze(1)
    reference_boxes_y1 = reference_boxes[:, 1].unsqueeze(1)
    reference_boxes_x2 = reference_boxes[:, 2].unsqueeze(1)
    reference_boxes_y2 = reference_boxes[:, 3].unsqueeze(1)

    # implementation starts here
    # parse widths and heights
    ex_widths = proposals_x2 - proposals_x1
    ex_heights = proposals_y2 - proposals_y1
    # parse coordinate of center point
    ex_ctr_x = proposals_x1 + 0.5 * ex_widths
    ex_ctr_y = proposals_y1 + 0.5 * ex_heights

    gt_widths = reference_boxes_x2 - reference_boxes_x1
    gt_heights = reference_boxes_y2 - reference_boxes_y1
    gt_ctr_x = reference_boxes_x1 + 0.5 * gt_widths
    gt_ctr_y = reference_boxes_y1 + 0.5 * gt_heights

    targets_dx = wx * (gt_ctr_x - ex_ctr_x) / ex_widths
    targets_dy = wy * (gt_ctr_y - ex_ctr_y) / ex_heights
    targets_dw = ww * torch.log(gt_widths / ex_widths)
    targets_dh = wh * torch.log(gt_heights / ex_heights)

    targets = torch.cat((targets_dx, targets_dy, targets_dw, targets_dh), dim=1)
    return targets

与decode_single方法相似，请参阅上篇博客的2.2.2节。

Faster RCNN网络源码解读（Ⅶ） --- RPN网络代码解析（中）RegionProposalNetwork类解析https://blog.csdn.net/qq_41694024/article/details/128506683 最终得到的targets是groundstruth相对于anchor的参数。

2.3 BalancedPositiveNegativeSampler类（det_utils.py）

2.3.1 init

    def __init__(self, batch_size_per_image, positive_fraction):
        # type: (int, float) -> None
        """
        Arguments:
            batch_size_per_image (int): number of elements to be selected per image
            positive_fraction (float): percentage of positive elements per batch
        """
        self.batch_size_per_image = batch_size_per_image
        self.positive_fraction = positive_fraction

简单的赋初值。

2.3.2 正向传播过程call

    def __call__(self, matched_idxs):
        # type: (List[Tensor]) -> Tuple[List[Tensor], List[Tensor]]
        """
        Arguments:
            matched idxs: list of tensors containing -1, 0 or positive values.
                Each tensor corresponds to a specific image.
                -1 values are ignored, 0 are considered as negatives and > 0 as
                positives.

        Returns:
            pos_idx (list[tensor])
            neg_idx (list[tensor])

        Returns two lists of binary masks for each image.
        The first list contains the positive elements that were selected,
        and the second list the negative example.
        """
        pos_idx = []
        neg_idx = []
        # 遍历每张图像的matched_idxs
        for matched_idxs_per_image in matched_idxs:
            # >= 1的为正样本, nonzero返回非零元素索引
            # positive = torch.nonzero(matched_idxs_per_image >= 1).squeeze(1)
            positive = torch.where(torch.ge(matched_idxs_per_image, 1))[0]
            # = 0的为负样本
            # negative = torch.nonzero(matched_idxs_per_image == 0).squeeze(1)
            negative = torch.where(torch.eq(matched_idxs_per_image, 0))[0]

            # 指定正样本的数量
            num_pos = int(self.batch_size_per_image * self.positive_fraction)
            # protect against not enough positive examples
            # 如果正样本数量不够就直接采用所有正样本
            num_pos = min(positive.numel(), num_pos)
            # 指定负样本数量
            num_neg = self.batch_size_per_image - num_pos
            # protect against not enough negative examples
            # 如果负样本数量不够就直接采用所有负样本
            num_neg = min(negative.numel(), num_neg)

            # randomly select positive and negative examples
            # Returns a random permutation of integers from 0 to n - 1.
            # 随机选择指定数量的正负样本
            perm1 = torch.randperm(positive.numel(), device=positive.device)[:num_pos]
            perm2 = torch.randperm(negative.numel(), device=negative.device)[:num_neg]

            pos_idx_per_image = positive[perm1]
            neg_idx_per_image = negative[perm2]

            # create binary mask from indices
            pos_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )
            neg_idx_per_image_mask = torch.zeros_like(
                matched_idxs_per_image, dtype=torch.uint8
            )

            pos_idx_per_image_mask[pos_idx_per_image] = 1
            neg_idx_per_image_mask[neg_idx_per_image] = 1

            pos_idx.append(pos_idx_per_image_mask)
            neg_idx.append(neg_idx_per_image_mask)

        return pos_idx, neg_idx
传入参数为：

@matched_idxs = labels：真实的label信息对应每张图片是正样本1、负样本0、丢弃样本-1的模板。

由于我们的正样本对应的labels为1，负样本的labels为0。
positive = torch.where(torch.ge(matched_idxs_per_image, 1))[0]
这行代码就是寻找正样本所在的索引位置存储在positive变量中，同理负样本所在的索引位置存储在negative变量中。

经过调试：

positive中仅有七个元素对应索引labels处的anchors信息。

negative中有261538个样本信息。

我们在__init__中表示取得总样本个数是256，正样本所占个数是50%，明显不够啊...

num_pos = 128，如果正样本数量不够就直接采用所有正样本
num_pos = min(positive.numel(), num_pos)
因此最后的num_pos = 7，num_neg = 249

通过torch.randperm方法进行一个随机的选取，将正样本进行随机排序，只取前num_pos=7个正样本（也就是全取呗）。存放在perm1中。

负样本进行打乱，取前num_neg = 249个负样本存放在perm2中。

将对应索引信息的正负样本取出来：

然后我们创立两个蒙版，其tensor的shape与matched_idxs_per_image相同。

在正样本的蒙版中pos_idx_per_image_mask将正样本所在位置设置为1，负样本中将负样本所在位置neg_idx_per_image_mask设置为1。

因为这里是遍历一个batch的图像，因此每张图像都有这个正负蒙版，在其中一个循环中，我们将蒙版信息加入pos_idx、neg_idx中返回给调用函数。