【MMDetection3D实战4】利用mmdet3d进行训练

文章目录

- 1. 介绍
- - 1.1 训练流程
  - 1.2 测试及验证
- 2. 训练过程演示
- - 2.1 准备数据集并处理
  - 2.2 加载并修改配置文件
  - 2.3 启动训练
  - 2.4 测试

1. 介绍

1.1 训练流程

MMDetection3D(mmdet3d)和OpenMMlab其他代码库是一样的，在训练的时候需要准备好一个配置文件，在配置文件中定义好所使用的数据、模型、优化器和一系列参数，对于单GPU我们可以使用tools/train.py config.py 启动训练。

因为mmdet3d提供了一系列标准的配置文件，通常情况下我们只需要继承这些标准配置文件，然后做一些简单的修改就可以。这些修改包括(数据路径、训练参数相关的修改)。注意的是，即便我们用的是标准的kitti数据集，在训练前我们还是需要利用mmdet3d提供的工具处理下数据，将零散的标注数据(标定+标注+图片路径等)形成一个完整的标注文件，方便程序在训练的时候进行读取。训练的流程总结如下:

(1) 下载并整理数据
- 标准数据集:
```
 python tools/create_data.py   kitti --root-path  ./data/kitti --out-dir ./data/kitti --extra-tag kitti
```
- 自定义数据集
  整理成支持的数据格式，如KITTI或nuscenes
(2) 修改配置文件(数据路径相关、训练参数相关)
(3) 启动训练
- 单机单卡: 使用tools/train.py config启动训练
- 多机多卡用: 使用tools/dist_train.sh(多卡)、tools/slurm_train.sh(多机多卡)

1.2 测试及验证

训练结束后，可以对模型进行测试验证，利用tools/test.py通过传不同参数，让模型保存测试结果或者对结果可视化，具体参数的使用可参考mmdet3d的相关使用文档

python tools/test.py myconfig.py work_dirs latest.pth

通过--show --show-dir tmp 参数，可视化预测结果
显示预测的指标: --eval mAP
保存测试结果 --out result.pkl

2. 训练过程演示

2.1 准备数据集并处理

下载kitti_tiny_3D数据集，该数据从KITTI数据中裁剪而来：从百度网盘下载，密码:9niw ，然后解压。
执行数据处理脚本

python tools/create_data.py kitti --root-path ./data/kitti --out-dir ./data/kitti --extra-tag kitti

2.2 加载并修改配置文件

mim download mmdet3d --config hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class --dest checkpoints

这个.py的config文件在项目的configs/pointpillars下就能找到，主要是下载预训练权重。可以从mmdetection3d仓库中下载预训练权重,下载好了放在 checkpoints文件夹下。
在这里插入图片描述

_base_ = [
    '../_base_/models/hv_pointpillars_secfpn_kitti.py',
    '../_base_/datasets/kitti-3d-3class.py',
    '../_base_/schedules/cyclic_40e.py', '../_base_/default_runtime.py'
]

point_cloud_range = [0, -39.68, -3, 69.12, 39.68, 1]
# dataset settings
data_root = 'data/kitti/'
class_names = ['Pedestrian', 'Cyclist', 'Car']
# PointPillars adopted a different sampling strategies among classes

file_client_args = dict(backend='disk')
# Uncomment the following if use ceph or other file clients.
# See https://mmcv.readthedocs.io/en/latest/api.html#mmcv.fileio.FileClient
# for more details.
# file_client_args = dict(
#     backend='petrel',
#     path_mapping=dict({
   
#         './data/kitti/':
#         's3://openmmlab/datasets/detection3d/kitti/',
#         'data/kitti/':
#         's3://openmmlab/datasets/detection3d/kitti/'
#     }))

db_sampler = dict(
    data_root=data_root,
    info_path=data_root + 'kitti_dbinfos_train.pkl',
    rate=1.0,
    prepare=dict(
        filter_by_difficulty=[-1],
        filter_by_min_points=dict(Car=5, Pedestrian=5, Cyclist=5)),
    classes=class_names,
    sample_groups=dict(Car=15, Pedestrian=15, Cyclist=15),
    points_loader=dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=file_client_args),
    file_client_args=file_client_args)

# PointPillars uses different augmentation hyper parameters
train_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=file_client_args),
    dict(
        type='LoadAnnotations3D',
        with_bbox_3d=True,
        with_label_3d=True,
        file_client_args=file_client_args),
    dict(type='ObjectSample', db_sampler=db_sampler, use_ground_plane=True),
    dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
    dict(
        type='GlobalRotScaleTrans',
        rot_range=[-0.78539816, 0.78539816],
        scale_ratio_range=[0.95, 1.05]),
    dict(type='PointsRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='ObjectRangeFilter', point_cloud_range=point_cloud_range),
    dict(type='PointShuffle'),
    dict(type='DefaultFormatBundle3D', class_names=class_names),
    dict(type='Collect3D', keys=['points', 'gt_bboxes_3d', 'gt_labels_3d'])
]
test_pipeline = [
    dict(
        type='LoadPointsFromFile',
        coord_type='LIDAR',
        load_dim=4,
        use_dim=4,
        file_client_args=file_client_args),
    dict(
        type='MultiScaleFlipAug3D',
        img_scale=(1333, 800),
        pts_scale_ratio=1,
        flip=False,
        transforms=[
            dict(
                type='GlobalRotScaleTrans',
                rot_range=[0, 0],
                scale_ratio_range=[1., 1.],
                translation_std=[0, 0, 0]),
            dict(type='RandomFlip3D'),
            dict(
                type='PointsRangeFilter', point_cloud_range=point_cloud_range),
            dict(
                type='DefaultFormatBundle3D',
                class_names=class_names,
                with_label=False),
            dict(type='Collect3D', keys=['points'])
        ])
]

data = dict(
    train=dict(dataset=dict(pipeline=train_pipeline, classes=class_names)),
    val=dict(pipeline=test_pipeline, classes=class_names),
    test=dict(pipeline=test_pipeline, classes=class_names))

# In practice PointPillars also uses a different schedule
# optimizer
lr = 0.001
optimizer = dict(lr=lr)
# max_norm=35 is slightly better than 10 for PointPillars in the earlier
# development of the codebase thus we keep the setting. But we does not
# specifically tune this parameter.
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# PointPillars usually need longer schedule than second, we simply double
# the training schedule. Do remind that since we use RepeatDataset and
# repeat factor is 2, so we actually train 160 epochs.
runner = dict(max_epochs=80)

# Use evaluation interval=2 reduce the number of evaluation timese
evaluation = dict(interval=2)

配置文件定义了数据集根目录data_root ,类别名class_names , test和train的数据处理pipeline;
其中模型结构的配置继承自'../_base_/models/hv_pointpillars_secfpn_kitti.py'，该文件定义了模型结构。
学习率和优化器的配置继承于../_base_/schedules/cyclic_40e.py,它定义了详细的cyclic学习率的配置策略，当前配置文件会对优化器部分参数进行修改。
kitti dataset相关的数据处理pipeline继承自../_base_/datasets/kitti-3d-3class.py, 在我们使用的配置文件,对pipeiline进行了修改。
定义了data 对继承自../_base_/datasets/kitti-3d-3class.py中的train、val以及test的原有pipeline进行了修改，对classes进行了修改
对optimizer 在继承../_base_/schedules/cyclic_40e.py的基础上，修改了学习率
定义了runner ，在继承'../_base_/schedules/cyclic_40e.py'基础上，修改了runner的训练的迭max_epochs
定义了evaluation ，在继承'../_base_/schedules/cyclic_40e.py'基础上，修改了runner中evaluation 的间隔
此外，像work_dir,resume_from,load_from等参数，定义在'../_base_/default_runtime.py', 如果需要修改，可以在本配置文件中修改，比如重新指定work_dir，或者加重模型权重进行resume等。

总之：所有配置都在继承的文件基本实现好了，然后针对不同版本的模型在继承的基础上进行修改。
在hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class配置文件中，在继承的基础上修改了一些配置，比如数据的路径data_root, 比如训练的pipeline等，class_names等，以及训练的max_epoches等。这是官方给我们提供的针对pointpillars对应版本的配置文件，我们也可以在该配置文件的基础上进行修改，因为大部分的内容我们是不需要修改，所以可以通过简单的继承方式，然后指定需要修改的内容。

我们可以尝试自己编写一个配置文件，因为大部分的内容都不需要修改，所以我们可以使用简单的继承的方式，自己编写的配置文件myconfig.py内容如下：

_base_ = [
    'configs\pointpillars\hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class.py'
]

data = dict(
    samples_per_gpu=4,
    workers_per_gpu=1,
    persistent_workers=True,
    test=dict(
        split='testing',
        ann_file='data/kitti/kitti_infos_test.pkl',
    ))

runner = dict(max_epochs=5)
checkpoint_config = dict(interval=5)
evaluation = dict(interval=5)
log_config = dict(interval=10)

load_from = 'checkpoints/hv_pointpillars_secfpn_6x8_160e_kitti-3d-3class_20220301_150306-37dc2420.pth'

首先定义_base_，里面放置我们继承的那个文件，这里对应对应就是pointpillars在kitti数据集上训练的配置文件
然后对data进行简单修改，修改了samples_per_gpu也就是batch_size, 以及dataloader的进程数workers_per_gpu
因为我们提供数据非常小，只是简单训练示例下，因此将max_epochs设置为5，只训练5轮看看效果
evaluation 的间隔也修改了下
然后将load_from 修改为我们下载好的模型路径，

编写好配置文件之后，就可以利用tools/train.py启动训练

2.3 启动训练

使用命令行工具启动训练,单机环境使用tools/train.py, 多GPU调用tools/dist_train.py

python .\tools\train.py myconfig.py

2022-11-03 19:22:45,595 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: win32
Python: 3.8.13 (default, Mar 28 2022, 06:59:08) [MSC v.1916 64 bit (AMD64)]
CUDA available: True
GPU 0: NVIDIA GeForce GTX 1080
CUDA_HOME: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
NVCC: Cuda compilation tools, release 10.2, V10.2.8
MSVC: 用于 x64 的 Microsoft (R) C/C++ 优化编译器 19.29.30146 版
GCC: n/a
PyTorch: 1.8.2
PyTorch compiling details: PyTorch built with:
  - C++ Version: 199711
  - MSVC 192930040
  - Intel(R) Math Kernel Library Version 2020.0.2 Product Build 20200624 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v1.7.0 (Git Hash 7aed236906b1f7a05c0917e5257a1af05e9ff683)
  - OpenMP 2019
  - CPU capability usage: AVX2
  - CUDA Runtime 10.2
  - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_37,code=compute_37
  - CuDNN 7.6.5
  - Magma 2.5.4
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=C:/cb/pytorch_1000000000000/work/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /w /bigobj -DUSE_PTHREADPOOL -openmp:experimental -DNDEBUG -DUSE_FBGEMM -DUSE_XNNPACK, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.8.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, 

TorchVision: 0.9.2
OpenCV: 4.6.0
MMCV: 1.6.0
MMCV Compiler: MSVC 192930137
MMCV CUDA Compiler: 10.2
MMDetection: 2.25.3
MMSegmentation: 0.29.0
MMDetection3D: 1.0.0rc5+962fc83
spconv2.0: False
------------------------------------------------------------

2022-11-03 19:22:46,834 - mmdet - INFO - Distributed training: False
2022-11-03 19:22:48,062 - mmdet - INFO - Config:
voxel_size = [0.16, 0.16, 4]
model = dict(
    type='VoxelNet',
    voxel_layer=dict(
        max_num_points=32,
        point_cloud_range=[0, -39.68, -3, 69.12, 39.68, 1],
        voxel_size=[0.16, 0.16, 4],
        max_voxels=(16000, 40000)),
    voxel_encoder=dict(
        type='PillarFeatureNet',
        in_channels=4,
        feat_channels=[64],
        with_distance=False,
        voxel_size=