MIT CORL 2021
纯视觉BEV方案transformer网络3D检测
paper:[2110.06922] DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries
code:GitHub - WangYueFt/detr3d
- DNN提图像特征,FPN提多尺度特征
-
pts_bbox_head Detr3DHead
- transformer Detr3DTransformer
-
Detr3DHead __init__ self.query_embedding = nn.Embedding(self.num_query, self.embed_dims * 2) forward query_embeds = self.query_embedding.weight hs, init_reference, inter_references = self.transformer( mlvl_feats, query_embeds, reg_branches=self.reg_branches if self.with_box_refine else None, # noqa:E501 img_metas=img_metas, ) Detr3DTransformer __init__ self.embed_dims = self.decoder.embed_dims self.reference_points = nn.Linear(self.embed_dims, 3) forward(self, mlvl_feats, query_embed, reg_branches=None, **kwargs): query_pos, query = torch.split(query_embed, self.embed_dims , dim=1) query_pos = query_pos.unsqueeze(0).expand(bs, -1, -1) reference_points = self.reference_points(query_pos).sigmoid()
-
Detr3DCrossAtten
-
MultiheadAttention
-
-
bbox_coder NMSFreeCoder
-
loss_cls FocalLoss
- transformer Detr3DTransformer